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PUBLISHERS’ NOTE 


The series in which this title appears was introduced by the 
publishers in 1957 and is under the general editorship of Dr. Maurice 
G. Kendall. It is intended to fill a need which has been evident 
for some time and is likely to grow — the need for some form of 
publication at moderate cost which will make accessible to a 
group of readers specialized studies in statistics or special courses 
on particular statistical topics. There are numerous cases where, 
for example, a monograph on some newly developed field would be 
very useful, but the subject has not reached the stage where a 
comprehensive book is possible; or, again, where a course of study 
is desired in a domain not covered by textbooks but where an 
exhaustive treatment, even if possible, would be expensive and 
perhaps too elaborate for the readers’ needs. 


Considerable attention has been given to the problem of pro- 
ducing these books speedily and economically. Appearing in a cover 
the design of which will be standard, the contents of cach volume 
will follow a simple, straightforward layout, the text production 
method adopted being suited to the complexity or otherwise of the 
subject. 


The publishers will be interested in approaches from any authors 
who have work of importance suitable for the series. 


CHARLES GRIFFIN & CO. LTD. 
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PREFACE 


The essence of mathematics is to take a given set of facts and to 
deduce their consequences. In the problems discussed in this 
monograph the initial facts are properties of the distributions of 
random variables, and the consequences which are of interest are the 
bounds which may be placed on the probability of the variables 
taking values belonging to some given set. The extreme situations 
are the one in which the distributions are completely determined and 
the probability is also completely determined, and that in which 
nothing is known about the distributions and the probability may be 
any number between 0 and 1 inclusive. Between these extremes lie 
the cases in which, from some knowledge of the distributions, we 
can say something which is not trivial about the probability. The 
known facts about a distribution may be numerical, e.g. that it has 
certain moments taking certain values, or geometrical, e.g. that it has 
a single mode or that the graph of its probability density function is 
smooth according to some criterion. The type of fact which is taken 
as known and the type of set considered are indicated in the chapter 
and section headings. Page references in the bibliography act as an 
index of names. 


The concept of convexity is found to be a valuable way of 
unifying much of the work, and in Chapter I an account is given of 
the ideas and results which are needed subsequently. In Chapter 
II we deal with the univariate distributions for which the data are 
expectations of various functions. When these functions do not 
depend on the distribution function it is possible to give a complete 
solution of the problem (though one which, in practice, may still 
involve some complicated calculation). The case of the mean range, 
which is expressible as the expectation of a function of the distribu- 
tion function, is considered as a special case in Section 2.10. In 
Chapter III we consider univariate distributions for which geo- 
metrical data are given. The method used in Chapter II can be 
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adapted for many cases here also, though it fails in the case of the 
range or when we are given restrictions on the magnitude of 
functions. In Chapter IV we deal with multivariate distributions; 
we confine ourselves to the case when the data are second-order 
moments, and even with this simplification it appears that the 
computational difficulties involved in finding a best possible solution 
are formidable. In Chapter V we consider not single variables, but 
sums of variables; here no general methods seem available and various 
problems are treated by ad hoc methods which yield results far from 
best possible. Finally, in Chapter VI there are some notes on the 
applicability of the results obtained in earlier chapters from the 
point of view of their value as sources of problems to the pure 
mathematician and for application by the statistician. The reader 
may find it useful to look at this chapter before going on to the 
detailed ones which precede it. 


Throughout the monograph the emphasis is on methods and 
solutions which lead to definite numerical bounds. Some recent 
work has been concerned with generalizing the earlier ideas, but 
where these generalizations do not give rise to concrete results they 
are mentioned only briefly. 


At the end of the monograph is a set of exercises; it seems con- 
venient to refer in these to work which, with hindsight, we may 
regard as elementary. It is not to be considered that the efforts of 
pioneers are being disparaged by being treated in this way. 


1963 H. J. GODWIN 
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CHAPTER I 


PRELIMINARIES 


1.1 Notation 

Since there is considerable variation among different writers in 
the matter of notation, we summarize here the notation which we 
shall use. 


(Ki, , x4) denotes a random variable from an n-dimensional 
population; when m — 1 we drop the suffix. 
F (x1, .... x») is the distribution function of the population, i.e. 


F (Xj, ..., Xn) is the probability that x, Xi, . , Xn < Xn. If the x, 
take discrete values then F is a step-function: if the x; take continuous 
sets of values then (9^ F)/(2x, ... 0x5) is called the p.d.f. (probability 
density function) and is denoted by f (x, ..., xn). It may be noted in 
passing that the distinction between discrete and continuous 
distributions is of little importance in the present work since a 
distribution of either type can be approximated to arbitrarily closely 
by one of the other type, and so we obtain no improvement in 
inequalities by confining ourselves to one type or the other. 

The expectation of the function  $(x,..., x4), written 


E ($(5,, ..., n)), is 


f. = o (* r Xn) dE (m, ..., Xa) 


— 00 — - 
z, = za 


where, to cover both the discrete and continuous cases, we interpret 
the integral in the sense of Stieltjes. If f (x,, ..., xn) exists we can 
write the integral as 


Faser nf e sm Ge dn. 


For n >2 we shall be concerned only with second-order 
moments; unless the contrary is specifically stated we shall assume 
that E (xt) — O (i = 1,..., n) and we shall write E(x?) — o? 
E (x; xj) = py o, 9;. 


i» 
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For n —1 we denote E ((x — ay) by ur (a). ur (0) is written ur’, 
and ur (Ai) is written simply r. These are ordinary moments; 
absolute moments E (|x — a|") are denoted by vr (a). We do not 
count po Or vo, which are both identically unity, among the moments; 
by the first r moments we mean 4, ..., 4r. vr (O) is denoted simply 
by vr; note that this convention is different for ordinary moments and 
for absolute moments — in the former case the absence of the 
brackets and prime means that we are taking moments about the 
mean of the population, but in the latter case we are taking them 
about the origin. Note also that the absolute moments v, (a) for the 
population with p.d.f. f (x) are the same as the ordinary moments 
ir (a) for the population with p.d.f. O for x < a, f(x) -- f (2a — x) 
for a < x, so that we could express any results in terms of absolute 
moments in terms of ordinary moments for a population of this 
kind. (To put the matter in geometrical terms, we can fold‘ the 
distribution about the line x — a, and measurement to the right of 
4 is equivalent to measurement in both directions from a for the 
unfolded distribution.) 

We shall be concerned with the problem of finding bounds for 
P(T), i.e. the probability that (x,, ..., Xn) lies in a certain set T; 
we denote by L(T) and U(T) the infimum and supremum res- 
pectively of P (T), under a given set of conditions on the distribution. 
A statement of the form “U (T) equals something" means that a best 
possible upper bound has been found for the probability; otherwise 
we may have to be content with a statement of the form “U (T) is 
less than or equal to something". 


1.2 Convexity 

A considerable unification of the results which follow can be 
achieved by the use of the ideas of the theory of convex sets. In this 
section we give an account of only those propositions which we shall 
need subsequently, leaving it to the interested reader to pursue the 
subject further in books such as that by Eggleston (1958). 

One definition of a convex set in n-dimensional Euclidean space 
is that if the points x (x;, ..., xn) and y (y,, ..., yn) lie in the set, then 


(*) Peek (1933) used this terminology. 
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so do all the points tx + (1 — t)y for O<t< 1. In geometrical 
terms, if the set contains the end-points of a straight line of finite 
length then it contains all its intermediate points. 

‘Two convex sets with no point of one in the interior of the other 
can be separated by a hyperplane, i.e. there exists a linear form 
4, X, + ... + an Xn which is non-negative at all points of one set 
and non-positive at all points of the other. If neither the sets nor 
their frontiers have points in common then there is a hyperplane 
which separates them strictly, so that a, x; + ... + an Xn is positive 
for all points in one set and negative for all points in the other. 

A hyperplane which contains at least one point of the frontier 
of the convex set S but is such that there is no point of S in one of 
the open half-spaces separated by the hyperplane is a support plane 
of S. For a bounded set (i.e. one such that the coordinates of all 
points of the set are bounded) there exist two support hyperplanes 
in every direction, but this need not be so for an unbounded set 
(e.g. a half-space). 

In the next chapter we shall be interested in a special case which 
can be stated as follows. The convex set S which we consider is the 
union of the half-spaces 


1(0) (x) >0 


where 1 () (x) = l (0) x, + ... + In (0) xn. 0 is some index which 
takes a set of values which need not be enumerable nor even one- 
dimensional. We are interested in the values of the linear form 
V (x) = Ai x, +... + An x4 which is known to be non-negative at 
all points of S and to take the value one at a point of S. If /, is the 
infimum of y in S then / = i at at least one point P of S (since S 
is closed), and P cannot lie in the interior of S since we could 
decrease / by moving from P in a suitable direction while still 
remaining in S. Since no point at which ý < o lies in S, the 
hyperplane / = 4 is a support hyperplane of S. Now let d (0) 
be the distance of P from the hyperplane 1(0)(x) = 0; if inf 
d (0) — 0 then P must lie in the interior of S. Hence inf d (0) — 0, 
and if we add to the set A of hyperplanes l (9) (x) = O the limiting 
cases of convergent sequences of these hyperplanes (i.e. if we take 
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the closure 71 of the set A) then P must lie on at least one hyper- 
plane of A. Since any n +1 hyperplanes through a point in z-dimen- 
sional space are linearly dependent we can express (x) — V, in 
the form 


S N00 . 


where k n and the summation may include hyperplanes from J. 
Now if A, — 0 we can find a point y for which 1 (%) (y) > 0, 
1(0,) (y) = ... = 1 (8x) (y) = O, so that y lies in S and yet 
V(y) < v. This is impossible and so 2,20, and similarly 
Az, , Ax are all non-negative. 


CHAPTER II 


UNIVARIATE DISTRIBUTIONS: 
NUMERICAL DATA 


2.1 Introduction 

In this chapter we deal with the most extensively studied case 
and shall show that an almost complete solution is possible, though 
one which may still present formidable computational difficulties. 
This is when the data are moments and T is one or more intervals 
(possibly extending to ＋ oo or — coo). We first note the restrictions 
which must be placed on a set of numbers if they are to arise as 
moments of a distribution and then show how, if a set satisfying the 
restrictions is given, we can obtain the quantities U and L. Some 
algebraic solutions for simple forms of T are then obtained. We 
next consider the case when we are given expectations of more 
general functions than powers of the variable. Finally we consider 
the case of the mean range, which can be represented as the expecta- 
tion of a function of the distribution function and so is of quite a 
different type from the other expected values which are used. 


2.2 Properties of moments 
Since a distribution function is required to be non-decreasing, 

it is not possible to assign arbitrary values to a set of moments and 
then find a distribution which will give these values. The considera- 
tion of the relations which must exist between moments was one 
which occupied analysts for many years, and in this monograph we 
shall discuss only a few points which have relevance to what follows. 
For further information on the subject we refer the reader to Shohat 
and Tamarkin (1943). 

Suppose that 4, ..., Han are given for a distribution; if we replace 
f (x) by (1— e€?^*1) f (x) with additional probability <2"+1 at a/ e then we 
obtain a distribution with moments hf where 
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ur =p,+O0(e) for r «2n 
, 1 
— = I^2n--1 "T gt 
lu? — ur œ as e— 0 forr 2 2n + 2, 


provided that a + 0. By considering what happens as e tends to zero, 
we see that Ag, is independent of the choice of pj, ..., Hon. We 
shall call the process of changing a distribution in this way adding 
zero probability at infinity" and shall use it in later work. If we 
proceed similarly starting with Hi, ., Ani then we add a?" to 
Hn, and so uh, can take arbitrarily large values for given pi, ..., Hona 
but not arbitrarily small ones (it cannot, of course, be negative). By 
consideration of the fact that 


f " (be b,x +... +5, x")? d (x) (2.2.1) 


is a non-negative form in bp, ...,b,, we have (see, e.g. Mirsky 
(1955), p. 400, for the relevant algebraic theorem) that 


1 m * * 

T „ flat 
"M sad E (2.2.2) 
un Pata -> Ban 


If the determinant on the left-hand side of (2.2.2) is zero then the 
form in (2.2.1) is semi-definite and can be zero without vanishing 
identically; this means that bọ, ..., 5, (not all zero) exist such that 


FÉ (by + byx + ... + b, x")? dF (x) = 0. 


This can be so only if dF (x) is zero except when x is a zero of the 
polynomial 6, + b, x +... + b, x" so that the distribution consists 
of a finite number of discrete probabilities. If the number of points 
at which dF (x) is non-zero is less than z, the distribution is said to 
be degenerate. 

We shall be interested later in using moments to estimate prob- 
abilities, and to that extent we may ask whether the above state- 
ments have converses; in other words, whether a set of moments 
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satisfying (2.2.2) determines a distribution. The answer is that at 
least one distribution is determined but that there may be no unique 
solution. In order to ensure uniqueness we require further condi- 
tions such as that the series 

w 
>> Hen 2n 


n=l 


1 
-5 


diverge (see Shohat and Tamarkin (1943), p. 20). This means 
roughly that the moments must not be too large or that the distribu- 
tion must not be too spread-out. (If the distribution is contained in 
a finite interval such that f (x) = O for |x| > A then us, < (2A) 
and 


2c 1 
T» - - 
> Han 9n 
1 


diverges; for the normal distribution with unit variance we have 
Hon = (2n — 1)... 3-1 < 2” n | and again 
ED i 


- — 
b Han 2n 
1 


diverges.) If the distribution is widely spread-out we can add to 
f (x) a multiple of a function such as exp (-H), cos ( |x |*), all 
of whose moments are zero, and still obtain a non-negative function 
which may be taken to be a p.d.f. For example, if 

fi (x) = 4 exp (— |x| *) 

fa (*) = d exp (— |x] (1 + cos (40) 


then all moments of f, (x) are equal to the corresponding moments of 
fa (x) (see Kendall and Stuart (1958, 1963), p. 109), but 


1 * 
f fa (x) dx = 4656 
— wt/4 


- n? 


n?/4 
f fa (x) dx = 7328 
- 14 


Using moments alone, we could never approximate to either prob- 
ability with an error less than -2672 .... 
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2.3 The application of convexity 

Suppose that we are given the expected values Hi, , H of the 
functions h, (x), ..., 4, (x) and that these expected values can actually 
be obtained with some distribution. 


We consider the linear form 
& (x) = ag + 41 74 (x) + ... + a, hy (x) 


and the values of ag, ..., a, for which 
& (x) > xr (x), (2.3.1) 


where x, (x) is 1 if x belongs to T and 0 otherwise. We call yp (x) the 
characteristic function of T. 


The set of points a (ap, ..., 4h) is a convex set A of the type 
discussed in Section 1.2. The closure of the set of hyperplanes 
defining A is obtained by replacing x; (x) by x7 (x), where x7 (x) = 
max {xr (x + 0), xr (x — O)] and is the characteristic function of the 
set T'* formed from T' by including the end-points of intervals in T. 
We shall have a different value of U (or L) for T'* from the value we 
have for T' only if there is non-zero probability at a point which is in 
but not in T, and this is so only when we have equality in some 
inequality such as (2.2.2), so that the distribution is necessarily 
discrete. Thus, although it is T* for which we are actually going to 
evaluate U and L, the results will normally be applicable to 7 also. 


If 
y = (ao. . a4) = E(g(x)) = ao + 41 H, +... 4% H; 


then y is non-negative in A, and at the point (1, 0, ..., 0) of A we 
have / = 1. Hence if yo is the infimum of V over A then we can 
express - yy in the form 


P» X, (4s + u (2) + ... + ay bs 60 — 22 00 


where r < k +1 and the A, are non-negative. 
16 
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On comparing coefficients of a, ..., a, we have 
1 
> Ah, (x) = H, (¢ = 1, ..., K) 
1 
fo = 2 à; x7 (x). 


The discrete distribution with probabilities A, at x, (i —1, ..., 7) 
thus gives the correct expected values and also gives 


r 


P(T*) = D N xr (x) = Ve 


1 
so that /% < U. 


But for any distribution giving the expected values Hi, ..., H, 
we have P(T*) = E (x7*(x)) < E(g(x)) = V. This is true for any 
choice of ao, ..., a, satisfying (2.3.1) and so P (T*) < y, and U < yy. 

Hence U = yo. 

Similarly we can find L by taking the supremum of , subject 
to the conditions g(x) < xx (x). 

We have thus reduced the problem of finding U to that of finding 
the curve which is “lowest” (in the sense of representing the function 
giving the least expectation) and which lies above a succession of 
lines at height zero or one above the x-axis. In the examples which 
follow we shall show how curves of this kind may be found: for the 
moment we note that the distributions which we find to give U 
(or L) are discrete ones, with probability only at the points where 
g= x. If, by intelligent guesswork, we can construct a discrete 
distribution which gives the required expected values and for which 
there exist ao, ..., a, such that (2.3.1) is satisfied, then we have 
constructed a support hyperplane to the set A which is in the correct 
direction and so is the unique one. Thus our guess may be estab- 
lished as correct without examining all other cases. This possibility 
of showing that a trial solution is the correct one arises in other 
applications of convexity such as linear programming; see, for 
example, Gale (1960), p. 22. 


B 17 


L 


The use of the form g (x) is due to Isii (1959a); the use of con- 
vexity has been discussed by Marshall and Olkin (1960b) in the 
context of multivariate distributions and by Kingman (1963) who 
states the problem in a very general form. Isii, for the case when the 
h; (x) are x, x?, ..., x*?", also deals with the more difficult question of 
whether or not a distribution will actually exist, and shows that 
under fairly general conditions this is so, provided the moments 
satisfy the necessary conditions mentioned in Section 2.2. 


2.4 A numerical example 


As an illustration of the method we consider the case in which T 
is the pair of intervals 4 < |x| < 2 and we are given 1 = O, ni = 1, 
l = 3. Guttman (19482) has given a formula which covers certain 
cases of this type, with two intervals symmetrical about the mean 
and 4; unknown, but the example which we are considering now 
does not fit into his scheme. 


We note that 


1 0 1 
0 1 B3 | =2— ug 
1 Ps 3 


is positive for a range of values of u;, and so we can assume that the 
distribution is not discrete, and we can include the end-points 
x = + $ and x = + 2 in T or exclude them as desired. If jd 
then we must have g = O, and the only possible distribution has 
probability at x — — 1 and x — 1. With the given value of u we 
can have this distribution together with zero probability at infinity, 


and so there is a distribution with unit probability in 7. Hence 
Uz1. 


In order to find L we consider polynomials g (x) of the form 
ao d a,x + a, x? + a,x* satisfying g(x) < xr (x), ie. g(x) « 0 for 
|x| € and for 2<|x|, and g(x) <1 for & 2 (remembering 
that in finding L the end-points of the intervals are taken with the 


lower value of xy (x); in finding U they would be taken with the 
higher value). 
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We suppose for the moment that a, = 0. 


We want to maximize V = E (g (x)) and we certainly increase y 
if we increase g (x) for all x. By increasing ay we may suppose that 
g (x) = xr (x) at some value x = a, and then by adding a positive 
multiple of (x ) that g(x) = xp (x) for a second value x = f. 
If there were a term in x? in g (x) we could also add a positive multiple 
of (x — a)? (x — B)* to get a third value at which g (x) = x4 (x), but 
this is not possible with the given conditions. Unless L — 0 we 
shall have xz (x) = 1 as one of the values at which g(x) = xx (x); 
we may suppose that « is such a value and, by symmetry, that 
$< « — 2. Since the coefficient of x? in g(x) is zero and a, 40 
the sum of the roots of g (x) — constant is zero, and so it would not 


be possible for g (x) to have a maximum value 0 at 8 with 8 — — 2 
or | B] < à; we may suppose that £ is one of — 2, — 3, 1 or 2 with 
& (B) = 9, or else that g (x) has a maximum value 1 at 8 with 8 = — « 


(from consideration of the sum of the roots of g (x) = 1). Taking 
this last possibility, we have 


g(x) = 1 — k(x — a)? (x + a)? 


where k is some positive number, and we require g (2) and g (—2) 
to be non-positive, 


i. e. Vk (4 — a?) 1 (2.4.1) 
and g (4) and g ( —}) to be non-positive, 
i.e. Vk (a — 1) > 1. (2.4.2) 
We have to maximize 
y = 1 — kat + 2ka? — 3k, 


subject to (2.4.1) and (2.4.2). The bounds for k given by (2.4.1) and 
(2.4.2) are equal for a? = 17/8; hence we have 
Vk> G- f for 17/8 a 4 
and 
VXN (A 11 for 1 —2?« 17/8. 
19 
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Now «*-— 2a? + 3 = (a? +1) +2 — 0 and so / decreases as k 
increases; hence 

y < 1 — (at — 2a? + 3) (4 — a?)-? = n, say, for 17/8 < œ? < 4 
and 

y < 1 — (at — 2a? + 3) ( — 1)? = , say, for }< a < 17/8. 


Now 6y,/0« = 0 for « = 0, + 3-* and so, for 17/8 < a? < 4, i isa 
decreasing function of « and y < y4,(4/(17/8)) = 16/225. Similarly 
V. is an increasing function of « for }< «*<17/8 and 


Y< Ya (/(17/8)) = 16/225. 
When a? — 17/8 and k = 64/225 then 
g (x) = 1 — (8K 2— 17) / 225 
g(+2)=0, g(+4)=0. 


If «, = O then we take g (x) = 1 — k (x — c) and we shall have 


either g () = O and g (2) < O or g (2) = 0 and g(3) < O. In the 
first case 


and 


g (x) = 1 — (x — a)? (x — 4)? where «a « 5/4 


and y = 1 — (1 + a?) (« — ) 2; this makes y negative. In the second 
case g (x) = 1— (x — a)? (2 — a)? and 5/4< a; y = (3 —4«) (2 — 02 
and is greatest for « = 5/4, being then also negative. Hence sup %4 
is not less than 16/225 and so L > 16/225. 

However, we can construct a discrete distribution with prob- 
abilities p at 4/(17/8), ((16/225) — p) at — 4/(17/8), and q, r at 
two of the points + 2, +- 4, to give the required moments and also 
give P(T) = 16/225. (For example, p = (84/17 + 64/8)/225 4/17, 
q = 37/225 at 2, r = 172/225 at — 4.) Hence L < 16/225 and so 
L — 16/225. 

Since g (x) is equal to xp (x) for more than four values of x when 
V attains its maximum, we have that the support hyperplane 
40 + a + 3a, = L is linearly dependent on more than four support 
hyperplanes and so is linearly dependent on support hyperplanes 
in an infinity of ways. Hence there is an infinity of distributions 
which give the value L for P(T), and among these there is just one 
symmetrical one (with probabilities 8/225 at 17/8), 86/225 at 
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+4, 37/450 at +2). We could, in fact, have simplified the above 
working by using the symmetry of the problem since if f; (x) gives 
P(T) = Pi then f,(—x) and the symmetric frequency function 
+ CA (x) + fı (—x)) both also give P(T) = p, and satisfy the same 
moment conditions. 


Since the polynomial g (x) giving L is equal to yp (x) for x = + $, 
+ 2 as well as at « and £, it follows that we should have arrived at 
the same g (x) and so at L by taking £ as one of the values + 4,42 
instead of the value at which g (x) attained maximum value of unity; 
it may be verified, however, that the analysis is more complicated 
in these cases. 


2.5 Single interval; first and second moments given 

To illustrate the technique used above when values are given 
algebraically and not numerically, we consider the case when 
nı = O, wy = 1, and T is the interval — & < x < B with 0 — & « f; 
i.e. T' is an interval containing the mean and extending from the mean 
at least as far to the right as to the left. 

We can satisfy the moment conditions with probability unity at 
0 and zero at infinity, and this gives U — 1. To find L we take T now 
as — «-— x — B and consider quadratic polynomials g (x) = 
40 + a, X + a, x? satisfying 


8(*)<xr(x)=0 G = or f= 
and 


E()<xr@)=1 (—a«<x< B). 


By increasing first a, until g (x) = xr (x) at x = y and then adding 
a positive multiple of (x — y)? we may suppose that g (x) = x; (x) 
for x = y, x = 8. If L — O then we may take — « < y < B (by 
interchanging y and 6 if necessary); since g (x) is quadratic and 


therefore g (x) — 0 has not more than two real roots, we must have 
ê - a or 8 = B. 


If 8 — —a then g(x) = 1 — (x — y)? («+ y)? and g(8) = 
1— (B— ) («+ y)? <0 so that B—y>y+ea. We have 
b= 1—(1+ y?) («-F and dyp/dy=2(1— ay) («+ y)-5. 
Hence / increases for — a< y < a-l and if 2 < «(B — c) the 
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maximum value of / is o?/(a?-1) for y= . If, however, 
2 — «(B — a) then y has its greatest value for y = 1 (B — ), and 


the value is 4 («8 — 1)/(« -+ 8)*. 


If 8— -+ B then g (x) = 1 — (x — g - , and since g (— «) < 0 
we have « + y > B — y. 

V — 1 — (1+ »?)/(8 — 9 and dy[dy = — 2 (1 + yB)(8 — yy. 
Since 2y > B — « > 0, / decreases as y increases, and the greatest 


value of / is for y = 1(B — a), being then 4 («B — 1)/(« + By. 


Now 
a? 4 («B —1) («B — «? — 2)? 


+1 (a+ BP (a?+1)(a+ fp)” 
and so when «?/(«? + 1) is a possible value for v, i.e. 
« (B — a) > 2, then L = a?/(a? +1); 
when a (8 — a) < 2, then L = 4 («8 — 1)/(« + B}. 


We assumed at the start that L was greater than 0; if «8 — 1 then 
«(B — «) <2 and the above argument gives L <0; this is a 
contradiction and so, for «B < 1, we have L = O. 

The above result was first given (with a general value for u) 
by Selberg (1940) who used a special method depending on 
Schwarz's inequality. 

If we take « ß then we have L = 1 — «-?; this means that for the 
complement of T, i.e. the set |x| > «, we have U = «-?. This result 
is one of the oldest in the subject and was discovered by Bienaymé 
in 1853. Tchebychef rediscovered it in 1867, and the prestige of the 
greater mathematician has resulted in his name being generally 
applied to it and to the whole subject which has developed from it. 
(The name of '"Ichebychef's inequalities" is also given to the 
inequalities for the case when T is x < k. ‘These were stated without 
proof by Tchebychef in 1874 but were proved ten years later by 
Markov and Stieltjes independently. We shall consider these in 
the next section.) 


2.6 Halfline; ordinary moments given 
In this section we take for 7 the set 0 <x and assume a 
knowledge of the moments yj, ..., Hon. (In previous examples we 
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have chosen the origin and scale so that 44 was 0 and p, was unity; 
in the present example it leads to a more symmetrical presentation 
if instead we make the origin the end-point of 7. For applications 
with given numerical values, it may be better to revert to the previous 
usage.) 

It will suffice to determine U since, as we shall see later, the 
procedure simultaneously determines L. 

We consider polynomials g (x) = ag + a, x + ... -+ asp x?^, and 
by adding successively negative multiples of expressions such as 
1, (), (x a) (x — BY, ..., we can suppose that g (x) = x; (x) 
for at least n +1 values of x, while g (x) > x4 (x) generally. Since 
there must be a maximum between each pair of minima of g (x), 
and since g (x) cannot have more than 27 — 1 turning-points, it follows 
that x = 0 must be one of the points at which g (x) = xp (x). We 
now show that there exist a discrete distribution having probability 
at O and r other points x,, , xr (r < n), and a polynomial of the 
required kind such that g (x) = x, (x) only for x = O, x, ..., , 

The matrix 


1 H ws ^ 

M = . s pss . 
, , , 
En Hn+1 tes Fon 


is, by hypothesis, that of a positive definite form; let p be such that 


1 —P 1¹ cee Hn 


has determinant zero. Since 


l—p .. nj un. Fati Mi c Ph 
— x = 0 


un- Mano] | Mata -e Pon Pn . Honma 
(by the Jacobi Ratio Theorem: see, e.g. Mirsky (1955), p. 25), we 
have 


1—5 ... Pici 


> 0. 
, , 
A1 Han 2 
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Similarly every principal minor of M, has non-negative deter- 
minant, so that p < 1, and also M, is the matrix of a positive semi- 
definite form. So too is the matrix 


M | 1 mip) .. -— 
OP) a-a salu —2) ou) J. 


Now, as stated without proof in Section 2.2, there exists a distribu- 
tion with moments g4/(1—5), ..., &5,/(1—5p). Since the matrix 
M,/(1—p) is that of a semi-definite form, there exist also real 


numbers dp, , an such that, for this distribution, E (a + ... 
+ an x")*) = O. Hence the distribution is discrete with prob- 
abilities 9, ..., q, at xy, ..., x, which are real zeros of % ... + an x^. 


We do not need to discuss the relation between r and n, but we may 
note in passing that if r < n then yj, ..., Han are moments of a dis- 
crete distribution with at most n values. This is possible even if the 
determinant of M is positive, since we may change pi, by adding 
zero probability at infinity to the discrete distribution which we 
have constructed. 

Hence there is a discrete distribution with probabilities p at 0, 
(1—5) 9. at x; (i —1, ...,7), with moments yj, , us. 

Suppose that x, ..., x, are negative, and x, i, ..., x, are positive 
(0 <s <r, with s = O or s meaning that all the x, have the 
same sign). Then a polynomial g(x) with values 0 at x, ..., X, 
values 1 at O, x,,,, ..., x,, and satisfying g(x) > 0 (x < 0), g(x) > 
1(0 < x), is 

p * (x un K*. 1) aes (x sd x,)? Q (x) 


where Q (x) is a polynomial to be determined. (This is for s > 0; 
if s O we may take simply g (x) = 1.) We need 


1+ ox (%; K. Ii“ . ( — x)? Q (x;) = 0 
and 
1 2 2 1 à 
— — T o ( z1, ..., s). 


x; XQ — X1 


These equations determine O (x,) and Q'(x,) and then 
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(x) = 


22,0 (0 P. (9) + 32 (Q' (9) — O (3 Pc (0) PG) (œ — xi), 
eh (n — mea E= meat e (n — 8), 
n.. (x. — x)? p (x; — K-10 (x; — x41)? ... (x; — x,)* 


with obvious modifications in the cases i —1, i = s. 


'The distribution which we have constructed now gives 
U = p + quai d ge 


The polynomial 1— g (x) satisfies the requirements on g (x) 
which are needed in determining U when T is x <0. Hence, 
since the distribution depends only on O, x, ..., x,, U, for this 
problem, is 91 + ... + q + p. But U for this problem is 1— L for 
the original problem and so 


L = 1— (q +... + e+ P) = Geta + +++ Ue 


For example, with i- 1, u5;—2, ug = —4, p= 10 
(equivalent to i = u = 0, wy =I, wy = 3 and T as 1< x), then 
P = à and —2, 4, —8, 20 are the first four moments of the distri- 
bution with probability 1 at —2 and zero at infinity. Hence g, — }, 
L = 0 and U = }. In this case g (x) = 1 (x + 2)? and is of degree 
less than 22 (= 4). 

If m= O, pi = à p= 0, wy =} then p = 1, and for the 
distribution with moments 0,1,0,1 we have E ((* — 1)?) = 0, 
so that the distribution has probabilities 4 at x — —1 and x —1. 
This gives L = $4.4 = 1, U = 1 + à = Z. In finding U we have 
& (x) = à (2x* — x? — 4x? + 3x + 4), so that the degree of g (x) 
is 2n (— 4). 

Taking T as — 2 « x and the moments those of the normal 


distribution with zero mean and unit variance, we have the following 
results — 


n L U 

1 :8 1 

2 -8947 1 

3 -9016 1 

4 -9106 -9998 ... 
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If T is —1 < x then the same moments give — 


n L U 

1 5 1 

2 5 1 

3 -6038 ... 9788 
4 -6209 ... -9739 ... 


It can be seen that the extra work involved in using higher moments 
adds little to the information yielded. 

In the above work it has been assumed that a set of consecutive 
moments is given; we could in fact deal in a similar way with any 
set of moments, except that the criteria for a distribution to consist 
of discrete probabilities are less easy to formulate. 

The fact that the values for L and U are best possible has been 
proved by Marshall and Olkin (1961), using the ideas of the theory 
of games, for the case discussed in this section as well as for some 
other cases. 


2.7 Absolute moments given 

If we are given absolute moments of a distribution about a 
value a (which we shall assume to be zero) then we may regard the 
distribution as symmetrical about O; and T, which will be defined 
initially only in terms of non-negative x, is the symmetrical set of 
intervals obtained by reflection in the point x — 0. If the moments 
given are those of orders 1, ..., n then we take 


g (x) = ag + a4 |x| + my an |x|*^. (2.7.1) 


Because of the symmetry of xp (x) it is sufficient to have g (x) > Xr (x) 
or g (x) < xr (x) (according as to whether it is U or L which we are 
finding) only for non-negative x; we can then drop the modulus 
signs in (2.7.1), and the difficulties which might be caused by the 
use of moduli are obviated. 

We shall consider only the case when T is k < x (corresponding 
to the case |x| > k in general) and we then have, in finding U, to 
choose g (x) so that 


g(x) O for O <x E, g(x) >1 fork <x. 
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Since we are concerned only with positive values of x we can 
subtract positive multiples of terms such as (x )? .. (x — ær)? 
or x (x —a,)” ... (x — r); if n is even, say n = 2m, this means that 
g (x) = xr (x) for at least m + 1 positive values of x or for O and at 
least m other values. Since g(x) tends to infinity as x tends to 
infinity (except in the trivial case when g (x) — 1), then if g (x) has 
minima at the positive values there are in each case at least 2m ＋ 1 
turning-points, and this is impossible; hence we must have g (k) — 
xr (A). This holds also for the case when n is odd. By the argument 
used in Section 2.6 we can obtain a discrete distribution which has 
non-zero probability only at some or all of the points at which 
g (x) = xr (x) and possibly zero probability at infinity, and the 
values of L and U are obtained from this distribution. 

If we are given not a consecutive set of moments but those of 
orders ñ, ..., i, then Wald (1939) has shown that the same results 
obtain, but a more refined argument about the changes of sign of 
polynomials containing only certain powers of x has to be used, and 
it is less easy to show the existence of the discrete distribution. 
Isii (1959b) has dealt with the problem on the lines which have been 
followed here. 

As an example, suppose k = 2 and the first two, three, or four 
absolute moments of the normal distribution are given. 

For n = 2 we take probabilities p at x — 2 and q at x — « to 
satisfy 1 = p + 9, \/(2/7) = 2p + ga, 1 = 45 4 qa? and so we have 
V(2/7) — 2 = 4(«—2), 1— 24/(2/v) = qa (« — 2) whence « = 
4955... p 2010 ., q = -7990 ..., and finally L = -7990..., 
U — 1. 

For n = 3, if we take probabilities p at x = 2 and q atx = & as 
above, then 8p + ga? = 1-7051...>2 V(2/7) = vy, and this 
inequality is only made worse if we try to add zero probability at 
infinity. Hence we must try instead p at x — 2, q at x = O and r at 
x =a, which give p = -1735 ..., q = -1620 ..., r = -6645 ... and 
a= :6785 ... so that finally L = -8265 ..., U= 1. Since these values 
give 16 p + rat = 2.916 ... < 3 = v, we mercly add probability 
zero at infinity to give the correct fourth moment when n — 4 and 
so obtain the same values of L and U. 'This means that in g (x) we 
take 2, as zero and so continue to use a cubic polynomial. 
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The corresponding values of L and U for ordinary moments 
with intervals 2 < |x| are 


n L U 
2 75 1 
3 75 1 
4 :8182 ... 1 


It will be observed that the absolute moments add little informa- 
tion when z = 4, but for n= 2 or for n = 3, when the third 
ordinary moment adds no information, a knowledge of the first and 
third absolute moments gives a considerable improvement. 


2.8 Finite interval 

We have so far considered variables whose distribution can 
extend to infinity in each direction, and where we have the possi- 
bility of varying the moment of highest order which we are given by 
adding zero probability at infinity. If the distribution is restricted 
to lie in a finite interval then the methods given in Sections 2.6 and 
2.7 still apply, except that now we may have non-zero probability 
at the ends of the interval. Suppose we are given the first » moments 
Hi» «++» n, f(x) is zero outside the interval b < x < c, and T is 
k « x « c. The inequality g (x) > x4 (x) (if we are finding U) has 
now to be satisfied only for b < x < c; hence we may reduce 
E(g(x) by adding negative multiples of terms such as 
(x ) - ai). . or (c — x)(x — ai). . or (x — b)(c — x)(x - ) 
Note that the coefficient of x in g(x) may be of either sign, in 
contrast to the case when the interval is infinite. By an argument 
similar to that in Section 2.6 we shall have g (k) = x (k). If n is 
odd (n = 2m —1) then if we have g (b) = xr (b) or g (c) = xr (c) 
we can have also m —1 other values at which g (x)= yp (x); if 
£ (b) = xx (b) and g (c) Æ xr (c) then we still cannot be sure of more 
than m —1 other values, but in this case there will have to be some 
special relation between the moments for the 2m moment conditions 
(including the moment of order zero) to be satisfied by m — 1 values 
of x and m probabilities. Similarly if z is even (n = 2m) we may 
have g(x) = x7 (x) either at k and m other values different from 
b and c, or at b, k, c and m — 1 other values. 
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If b= —1, c=1, R=0, ui = O, „a= 4 (moments of the 
rectangular distribution) then we construct distributions with 
probabilities p at x = —1 and q at x = 0 (= k) or else p at x = —1, 


q at x = O and r at x = 1. The first possibility leads to no solution, 
while the second gives p = r = 4, q = $. Hence U = $ and, by 
symmetry, L = z. 


Use of further moments gives L, U as follows — 


n L U 

3 :1667 ... «8333 ... 

4 :2778 ... 72222... 

5 2778. ... 
If T is <x < 1 we obtain 

n L U 

2 -4286 ... 1 

3 oo) Ly 1 

4 "2339 2. :9286 ... 

5 5472. 9247 


As in other examples, higher moments yield a rapidly diminishing 
return on the labour involved in using them. 


2.9 Expectations of general functions given 

When we are given not moments of a distribution but the 
expected values of functions other than powers of x, then the results 
we obtain will depend to a great extent on what these functions are. 
We can no longer, as in the sections above, use the familiar 
properties of polynomials. " 

If E (S (*)) is given and we require U when T is k < x, then 
we consider g (x) = a, + a, $ (x) and require a, + a, ¢ (x) > xp (x). 
This means that unless a, — 0, leading to the trivial solution U — Lh 
$ (x) must have a finite lower bound. We consider some special 
functions satisfying this condition. 

If $ (x) = e* we note that E (er) can be increased by the addition 
of zero probability at +- oo, while a finite probability at a large 
negative value of x will only affect E (e?) by a small amount, so that 
in this case we need also the limiting concept of finite probability at 
— oo which leaves E (e*) unaltered. 
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If E (er) < e we can take probabilities E (ee at x = k and 
1 — (E (e*)/e*) at — œ to give U = E (e ; if E (. > e* then 
U = 1. In this simple case the reader will find it instructive to sketch 
the convex set and support line in the (ao, a,) plane. 

For the normal distribution with zero mean and unit variance 
we have E (e = et and for k > à we have U = e-k, From the 
moments Af = 0, p = 1 we have (see Section 2.6) U = 1/(k? + 1) 
for k > 0, so that knowledge of E (er) (which implies much greater 
knowledge of the distribution for large positive x) gives more 
information when & is large and positive, but less for k < 2-44 .... 
If $ (x) = e*'/* then we can no longer have finite probability at — oo, 
but if E (e7'/*) < ek'/* we can have probabilities 


(ex / — E (e2*/4)) /e / —1) at x = O and 
(E (e714) —1)/(e*'/* —1) at x = k, giving 
U = (E (e**i*) — 1) (e^ —1). 


Again a sketch of the (ap, a,) plane will be helpful. 

For the normal distribution as above, we have E (e7'/*) = 4/2, 
giving U = (4/2 —1)/(e*'/* —1). This is an improvement on 
U = 1/(k? +1) for k > 223 .... 

For the case when the expectations of two functions are given we 
shall consider particular conditions which lead to a result of von 
Mises (1939). We suppose that f (x) is non-zero only for 0 < x < d, 
that ¢, (x) and 4, (x) are the functions whose expectations are given 
and are such that 


gı (0 = $2(0) — 0, 41(3) o, 44 (x) > 0, and 
6 (x) $1 (x) — 41 (X) $3 (x) > 0. 
The conditions 1 (x) > 0, ¢3(x) > O ensure that there is just one 
value of x for a given value of 4, (x) or $5, (x), while the condition 
$a (x) 41(x) — 4; (x) $2 (x) > 0 


means that 4; (x)/4; (x) is strictly increasing, so that, for given a1, a, 
not both zero, the equation a, 44 (x) + a, 45; (x) — O has at most one 
solution in x. Hence g (x) = a, + a, 4, (x) + az $a (x) takes a given 
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value for at most two values of x, for each set of values of a,, 41, 42 
(ai, az not both zero). We define di, d, by the equations 


$z (di) -— E (42 (x)) 
$, (di) E (4, (*) 


and 
$, (42) 6h (da) 1 
E (4, (x)) E ($a (*) 14 
$; (d) $2 (d) 1 


In geometrical terms, the curve u = ¢,(x), v = (*) in co- 
ordinates (u, v) between the origin O and the point D (4, (d), $a (d)) 
and the chord OD together enclose a convex region, and the point 
G (E (4, (x)), E ($a (x))) lies inside or on the boundary of this region 
(this is most easily seen by approximating to the p.d.f. used to give 
the expected values by a discrete distribution). Hence the points 
D, ( (di), $2 (d1)) and D, (4, (dz), 4» (de)), which lie on OG, DG 
respectively, lie in the arc OD. 


If 0 <k < d,, let k’ be defined by 


$i (0 $a (K^) 1 
E (4, (x)) E ($a (*)) 1 0, 
$i (k) $2 (k) 1 


i.e. the points K (4, (k), 4, (k)), K (41 (&), $a (K)) and G are collinear. 
We obtain the correct expectations E(4,(x)) E ($a (K)) with 
probabilities 


— E ($, (3) +4 0 — E (ha (9) 
— 4 (k) + 61 (&^) 6 (&) — dz (k) 
E ($1 (*)) — 1 (k) _ E (42 (x)) — a (k) 
1 (k) — 4; (k) $a (K) — $2 (k) 
Now if ao, al, a, are such that g (k) —1, g (k^) = 0, g (*) = O, we 
have g (*) > O for x #k’ and also g (0) — a, = 
$1 (R’) pa(k’) — pa(k) pik) — ] 
$1 (0 pa () — $2 (K) $1 (R) + 4s (k) $1 (€) — 41 (k) 45 (k) 
The function $ (x) = ġa (x) 41 (k^) — 4, (x) 4; (K) has zero derivative 
at x = k’, and this is its only stationary point. It is 0 for x = 0 and 


at x = k and 


at x = K. 
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consequently ay — ¢ (k’)/(¢ ( — ¢ (k)) > 1, so that g (x) >1 for 
O « E. 

Hence U = (i (K 0 — E (Yi (*)) / ( () — 4,(k)) and, from the 
construction, L = 0. A similar argument holds for d, < k <d, 
giving 

L = (E (4, (x)) —4109)/(41 (k) — 41 (9), U = 1. 

For d< k < d, we take probabilities 
P m 
(E ($5 62) — $1 QE (ha ()) D — (E (91 (30) — $1 (A) ME ($5 = $2 (4) 

$1 (d) $: (k) — F (k) 4. (4) 
E ($2 ()) $1 (d) — E ($: (2) $2 (d) "— 

$1 (d) $a (k) — $1 (k) ha (d) 

E (d (x)) $2 EG. G9) $1 (k) 


* % 


and choose ao, a1, a, so that 1— g (0) = g (k), O = (ad); since 
g (x) has a single turning- point this gives g (x) > xr (x), and we have 
L=p,, U = pı Pa- 

If the distribution has infinite range we can let d tend to infinity 
in the above values. If 4, (d) and 4, (d) remain finite as d tends to 
infinity, we may need positive probability at infinity; if 4, (d)/¢, (d) 
tends to infinity as d tends to infinity then E (4, (x)) may be affected 
by zero probability at infinity. 

As an example, if ¢,(x) = es — 1, 4, (x) = e2'/* 1, and 
E($4,(x) = v ($) —1, E (S. (*)) = 4/2—1 (the values for the 
normal distribution with zero mean and unit variance), then d, — 
V(8 log ( — 2)/(2 — V) = 2-034 , d, = (8 log (2(A/3)) = 
1-073 .... For k = 3 we have k’ = -924 ... which gives L = -979 .... 

Using a single function ¢ (x), we have L —1— E (4$ (x))/ 4 (k) 
which gives L = -926 ... for ¢,(x) and L = -951 ... for 44 (x), 
U being 1 in every case. 


at x = O, 


P: = 


atx=d 


2.10 Mean range given 

The information about the distribution of the variate x in the 
previous sections of this chapter has all been in the form of the 
expectations of various functions of x, these functions being inde- 
pendent of the particular distribution. In this section we take as 
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atum the mean range of a sample of given size; this can be represented 
only as the expectation of a function of the distribution function, 
and the method used hitherto no longer applies. Instead we use a 
method which consists essentially of replacing the given distribution 
by one more closely grouped about the median. It will be seen 
that the inequality obtained is different in type from those in the 
earlier sections. 

If w is the mean range in samples of n from the population with 
distribution function F (x), then 


uf (1 — Fs — (1 — F)*) dx 


- E ” R(P)ds, say. 


(For a proof, see, for example, Kendall and Stuart (1958, 1963), 
p. 339.) 


m-—1 r 2 
Let W (m) R (5): since a < 0 for 0 < F< 1, we have, 
t=1 


by comparing areas under chords and tangents to the graph of R (F) 
and under the graph itself, 


1/(2m) 


1— 1/(2m) 1 
m | R (u) du — W (m) — m | R (u) du. 
0 
Hence 


Wm + 1) — Wn) > (m+ 1) f 


1— 1/(2(m + 1)) 1 
R (u) du — m | R (u) du 
0 


1/{2(m + 1)} 
— 2 u 2m + 1\"+1 1 n+l 
n4i * * (22) "eq (r2) } 


2 2m -- 13 , 
"EET {m (=+ 1) (23) ET 


also W (1) — 0, and W (m) tends to infinity as m tends to infinity. 
Hence there is an integer m uniquely defined by W (m) < t < 


W (m + 1) for any given positive value . For this value of m define 
p, where 
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wet SP Sa b 

t = V R(ip). 2.10.1 
»» (p) ( ) 


This gives a unique value for p in the given range since the right- 
hand side of (2.10.1) has a negative derivative with respect to p for 


1 
dui 7 5 
and its second derivative is negative for O p< 1. 


Now R(F)+ R(F + p) +... + R(F + mp) has a negative 
second derivative with respect to F, and for F = 0 and F = 1 — mp 
it has the value ti; hence we have 


R (F) -- ...-- R(F + mp) >t" for 0< F< 1— mp. (2.10.2) 
Similarly 
R(F)--...-- R(F-F-(m—1)»)»t? for 1—mp<F<p. (2.10.3) 


We now suppose that, for a value t’ > t and for all x, 


t 
f dF (x) <p. (2.10.4) 
We can write this as 
F+p 
f dx> t'w. (2.10.5) 
F 


We can approximate arbitrarily closely to a given distribution by one 
for which f (x) >0 and so one for which the distribution function 
F (x) has a uniquely defined inverse function. We may therefore 
suppose that this situation obtains. We may also suppose that 
the median of the distribution is x — 0 (since all conditions are 
independent of the choice of origin) and we define F (x) by 


Fi (x) = F(x) forO0< x, 
Fi (x) = max (0, F(x + kt'x) — kp) forx< O, 
where k is the integer such that — kt’w < x < — (k — 1) t'vv. 
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We then define F, (x) by 
F. (x) = F, (x) for x <0, 
F, (x) = min (1, F, (x — /t') + lp) for O <x, 


where Z is the integer such that (I — 1) w< x< lt'w. The graph 
of F(x) thus consists of repetitions of the portion of the graph of 
F (x) for 0 — x< t'w, at heights above the x-axis which ensure 
that, for F (x), the inequalities (2.10.4) and (2.10.5) hold, with 
equality as far as possible. 


Using (2.10.5), we have Fy (x) = F((x)& F (&) = à for x — 0 
and $ < F(x) = Fi (r &« F(x) for Ox. Hence R(F,(x)) < 
R (F (x)) for all x, and so 


w> Í RC. ) dx. (2.10.6) 


F(x) is monotonic increasing and continuous on the right 
everywhere. It is also continuous on the left, except possibly at 
points where x/t’w is an integer. Whenever 0 < F, (x) < 1— p we 
have 

Fa (x + t'w) = F. (*) ＋ p (2.10.7) 
and so F, (*) defines a distribution with a finite range from x, to x, 
where, from (2.10.7), mt’w < x, — xg < (m J-1)t/w. If we let 
X,— Xo = mtw +r, then Fi (* r) = F. (xı) — mp = 1 — mp 
and F (xo + t'w — 0) = p. 


Now, using (2.10.7), we have 
F. R (F; (x)) dx = 


fo" RE) + + RE) + mp) 4 


+ [TTT aam e» + os ROG) + (om —1) 9) ds. 
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Using (2.10.2), (2.10.3) and (2.10.6), this gives w > t^! t'w so that 
tt. This contradicts the assumption that £< ť and shows that 
(2.10.4) must be false. Hence for any t >t we have 


z-4- tw 
sup f dF (x) > p 
and so T 


z-—tw 
sup f dF (x) > p, 


which is the inequality proved by Winsten (1946) who gave tables 
to assist in the calculation of p from ż. It is to be noted that unlike 
the previous inequalities it relates not to probability in a single 
interval but in a class of intervals. 

The inequality is best possible, as can be seen from considera- 
tion of the discrete distribution with probabilities p at x — 
a, a + 4% 10 (1-- e), . ., « + (m — 1) / (1 + e) and 1— pm at x = 
« -+ nt % (1 ＋ ). « can be chosen so that the median of the 
distribution is at x = 0, and we have 


z--t'w 
sup , dF (x) =p, 
while * ap 
w = t’w(1+ e) >> R (ip), 
i=1 


so that £ = t” (1 + «). By taking e positive but arbitrarily small we 
see that no improvement is possible in the inequality. 


CHAPTER III 


UNIVARIATE DISTRIBUTIONS: 
GEOMETRICAL DATA 


3.1 Introduction 


In the previous chapter the only restriction placed on a p.d.f. 
was that it should be non-negative, and the distributions giving the 
values L and U were all discrete ones. If we now restrict the p.d.f. 
or its dcrivatives in some way, such distributions may become 
inadmissible and the values of L and U may be altered. In this 
chapter we suppose that the signs of the derivatives are specified, 
and we show how the method used in Chapter II can be modified 
to deal with this situation. As examples of the method we shall 
obtain a number of inequalities obtained by other writers by various 


methods. 


3.2 Unimodal distribution: second moment about mode given 


We consider first the case of a unimodal distribution for which 
the second moment about the mode is given and for which T is an 
interval symmetrical about the mode; this case is the oldest one 
considered, the result having been given by Gauss in 1821. 

We may fold the distribution about the mode, which we suppose 
to be at the origin, and we then have 


f'(x)&0 for O 
F = 0 forx- 0, 
while T is 0 « x< k. 


As in Chapter II, we define g (x) as a, + a, and we compare 


i O de with . FO O ax. 
37 


Integration by parts gives for these integrals 

— Í 7 f'(x)Xp(x)dx and — | f (x) G (x) dx respectively, 
z r 

where Xp (x) -Í xa (u) du and G (x) -Í g (u) du for 0<x, 
0 0 


and X (*) = G(x) = 0 for x < 0. (If f(x) possesses a finite second 
moment then it must tend to zero in such a way that f(x) X (x) and 
f(x) G (x) tend to zero as x tends to infinity.) To find U we want 
to have 


fO f@ x2@ de< [fe @ ae 


and so, since f'(x)— O, we need X,(x)< G(x); if, moreover, 
G (x) = X4(x) whenever f'(x) + 0 then we shall have strict equality 
between the integrals and so obtain the exact value of U. 


For the case considered, Xy (x) — O forx<0 
=x for OS R 


=k for k<x, 
and G (x) — b + a, x + a, x?/3 for O x, G(x) = O for x < 0. 


Since f'(x) O at x = 0 we must take 5 = 0. (Strictly f'(x) 
does not exist at x = 0, but we can replace all vertical“ parts of 
the graph of f (x) by steeply sloping lines and proceed to the limit to 
get the inequality we want.) To find U we now have to reduce G (x) 
as far as possible while still satisfying G (x) > X (x) for 0 < x. 

We have a> 0 (since G (x) > Xp (x) for large positive x) and 
40 2 1 (since G (x) > Xp (x) for small positive x) and so G (x) > x; 
G (x) = x gives g(x) —1 and U = 1. This can be seen independently 
since we can take f(x) = (1—«)/k for O< x «k, and f(x) =e/l 
for æ x«k-r-1 


Her. 


This gives the correct value for p, and P (T) = 1 — e, where e can be 
arbitrarily small. We can describe this process as adding an 
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infinite tail of zero probability", corresponding to the process of 
adding zero probability at infinity when we have no restriction 
on f'(x). 

In finding L we have G (x) « Xp (x) and we want to make G (x) 
as large as possible. By increasing a, or a, we can make the graphs 
of G(x) and X (*) touch at x =a where k< ø. (It would be 
impossible for the graphs to touch at x — 8 where 0 < B< k 
since a2 < 0 and G (x) S a, x « x.) 


Then G (x) = k — k(x — a)? (x + 20/28 
- 3k 
giving g (x) = 5:3 (- x?) 


and since we must have G’(0)< 1 we need 3k < 2a. We also have 
E (g (x)) = (3k[2«) — (3K A). 
Hence 


d 3k  9ku; 3k, , 
da E ( (*) = — Z + Za 2 GLA 4). 


If k > (4455/3) then for « > 3k/2 we have 3p, —2a?— O, and 
E (g (x)) is greatest for «— 3k/2, so that 


4 r 
L=1- 5 (3.2.1) 
If k < NAH) then E (g (*)) is greatest for «= GA, giving 
k 
L= — 3.2.2 
V Br) eee) 


The results (3.2.1) and (3.2.2) are known as the Gauss-Winkler 
inequalities, having been stated (without proof) by Gauss in 1821 
and extended by Winkler in 1866. (See Fréchet (1950).) 


3.3 Unimodal distribution: first and second absolute moments 
about mode given 

If in addition to 4 = v we also know », (i.c. ur for the folded 

distribution) then we take G (x) to be 


We must have 3», > 477 since 
hr (x) x (u + tx)? dx 
0 


is a form in u and £ which cannot take positive values. 

In evaluating U we now have the possibility of G (x) being 
equal to X (x) at x = a, where O < a < k, or at x =f where k < f. 
In the latter case, by adding a negative multiple of x (x — B)? to 
G (x) we can make G (x) equal to Xp (x) at x = k also. 


In the first case we have 


3a, = — 4 aas, 6 (ao — 1) == 2a? as, 
giving 


2 
i A aankan 42 va — em + 3 +1>1, 
and so U — 1. 
In the second case G (x) = k + (x — k)(x — B)*/g? 


y = 3v,[B* 2 (2B + K) / B* -+ (8* + 285 / B 
=i-+ (2k — 4v) /B — 2k 10 / g* 


d , 
45 — (4, 20/2 (3v, — 2k 70/5. 
Now if k < 2», then 3v, — 2k», O and dy/dB = O for 
1 
a ä 
giving — 2n — k} 
. 


but if 2v, < k then / decreases for k < B and so U — 1. 


In evaluating L we can have G(x) = X (Y) for x =« with 
0< a< k and for x =$ with k< B or G'(0) = X7(0) (—1) and 
G (B) = X (G), or a, = 0, G(8) = X4(B), C (0) < 1. (It may be 
recalled that in Section 2.7 we had an example in which the highest 
moment produced no effect.) 
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In the first case we have 
G (x) = k — k(x —Bf (x+ y) B* v 
and we need k — k(x — B} (x + /B y = x to have roots O, «, &. 


Hence 

y — 28 = —2« and 52 — 28y + B? ylk = æ. 
Hence y = 2(8—2a) and 282 — fn (2 ＋ 3k) + 4«Bk — a? k = 0, 
i.e. ( = 3kB + ka) = 0. 


Hence a —(3kB— 282) x and we must have f 3k/2. Since 
(k — a) = (k — B)(k — 28)/k 
the condition x< E is automatically fulfilled. 


This expression for G (x) gives 


y= h {— 3kv, + 4», (3kß — 26%) — 9kBg* + 863}. 


— 
4B? (B — k) 
In the second case we have 
G (x) = k — (x — BY (2k — B) x + / 
so that 8 < 2k and 
y = V. = — (3v, (2k — B) + 2», (26% — 3Bk) — 8805. 
In the third case we have 
G (x) = k — k (x — Bg 
and we need 2k < B and have = 4 = 2k (v, — B)J?. 
If pı = V, then (28 — 3k (82 — 4», B+ 3»,) — 0, and so 


; A 3k 
i Ya is of constant sign for k< B(< 2 for / to be considered). 


Now, for B = k, the expression 
— Skv, + 4v, (3kB — 28?) — 9kB? + 8g» 


equals — * + 4k?v, — 3kv, and is negative unless 41 = 3v, and 


k = 2», In this case J, = », (48* — 5»,8 + 2% / 9 (for B > 2) 
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and this tends to 1 as 8 tends to 2»,, while V; is 2 for this value of f. 
Hence in any case V, < y, and so we calculate L from y, for 
k< B < 3k/2, from y, for 3k/2 < B < 2k, and from y, for 2k < B. 


Now dyV4/dB = O for 
3 1 248? — 18kB — 16v, 8+ 124 k 


"B^ B—k 855 — 9kB + 4v, GRA 25% — Skv, ~ 


i.e. 
4B — 3k — (48 — 3k68 — 4%) — 
l B(B—k)  8B3 — 9kB* + 4v, (3kB — 22) — 3kv, 
ES 283 — B (4v, + 3k) + 8kv, B 3K — O (3.3.1) 
1.6. 
(8B — ky (28 + k — 4v,) = — k® + 4ktv, — Bhp. 


Since — k? + 4&*v, — 3kv, « 0, dJ,/dB is equal to O for just one 
value of f greater than x. 


dys — 16 
For B = 3k/2, dB — “OR? (v k — va) 


and, since V, = V, has a double root for 8 = 3k/2, dy,/dB has the 
same value there. Since 


dpa  (B— 3k)(4v, B — 6ra) 4% 2k 
Te — dana Sh 7 an B) 
we have that, for k< v, 44 has a maximum at f = 2», V, is an 


increasing function for 3k/2<68<2k, and i is an increasing 
function for k< B < 3k/2. 


Hence L = Rui. 


For 1 < k« 3v,[4v, d is a decreasing function for 2k T B, 
but V, and V, are increasing functions in their respective ranges. 


Hence L =1— »,/2k. 
For 3»,/4v, < k < v,[v,, a is a decreasing function and i is an 


increasing function in their respective ranges, but e has a maximum 
for B = 3»,/2v, and 


Finally, for v, < k»v,, both V, and y, are decreasing functions in 
their respective ranges, while / has a maximum at the value of 8 
given by (3.3.1) and L is the value obtained by substituting this 
value of B in y}. 


These results were obtained by Royden (1953) by showing that 
the problem could be reduced to one in which the graph of the 
p.d.f. consisted of a bounded number of rectangular blocks. 


3.4 Unimodal distribution: first and second moments about any 
point given 
In this section we take 4; = 0, pg —1, T as the interval x < k < O, 
and we suppose that f (x) has a single maximum, say at x =ô. 


Since for any choice of y we must have 


F. (x d) — y)? f' (x) dx < 0 


ie. 3 + 2y8 + 8* 2 0 (on integrating by parts), then we must have 
[5| «4/3. 


In finding U we need G(x) < Xp(x) for x <8 and G (x) > 
Xp (x) for 8 «x, where Xp(x) = x — k for x < k, Xp(x) = 0 for 
R &. 


If G (x) = Xp(x) at x =a then, by adding a negative multiple 
of (x —8\(x - ) to G(x) we can ensure that G (x) = Xp (x) at 
some other value x =£. If 8— k, then from consideration of the 
number of times the curve y — G (x) (a cubic function) can meet 
the lines y — x — k or y — 0 we must have G (k) = 0, G (a) = 0 
(k< «), and G.) = O, unless G (x) = x — k, so that U —1. 


This gives 


G (x) = (x — k)(x — a)? /(8 — a)? 
and 
y = Yo = (3 + a? + 2ak)/(8 — ay. 


This has a minimum for a = —(3 + kô)/(k + 8) ( >k), giving 


U = (3 — k*)/(3 — k?+ (k +8)?) < (3 — K / + 35). (3.4.1) 
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If k< ô then the graphs of G(x) and X,(x) must touch at 
x — B (B< k) and x = a (k< a), so that 


G (x) = A (x — 8)(x — a)? 


where G (B) = A(8— 88 — e = 8— k 
and G'(B) = à (B — «)(38 æ 28) = 1. 
Hence 


a (k — 8) + 28? — B (3k + 8) + 2k8 = 0. 


If we are given ô we can eliminate a, and we find that y is a 
minimum for a value of f satisfying a cubic equation with co- 
efficients depending on ô and k. We should then have to maximize 
U for variation in ô to obtain a result valid for any 5. It is easier to 
eliminate 8 at the outset, and we then have 


y = (3 + a? + 205)/(B — «)(38 — a — 28) 
= (a? + g + 4«B* — 6ka«B + 3a + 38 — 6k)/(« — B)3. (3.4.2) 


« and £ have their ranges of variation restricted by the requirements 
P< k, k< « and |8| «4/3. 


If |5|—^4/3 then the distribution must be rectangular with 
centre at the origin and range 2 4/3, so that 


U=0 for x —4/3 
} (3.4.3) 


U = (k V/ zZ for —V3<k<0. 
If 8 — k then ô= k and we obtain the result in (3.4.1). If «=k 
then, since -+ k > 28, we have B = k again. 


For stationary values of y inside the region in which a and f 
vary we have, from (3.4.2) (denoting the numerator of the expression 
there by E), 


3a? -- 2«p + 482 — 6kB + 3 3 
—— — I E 0 (3.4.4) 
and 
a? -+ 8«B — 6ka + 3 3 u 
n + — = 0. (3.4.5) 
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(3.4.4) gives (« + 28 — 3k)(2B + P? + 3) = 0. If « = — 28 + 3k 


then (3.4.5) gives B = (3k? + 1)/2k, whence «= — 1/k, ô= e [k 
also, and / = 4/9(k?+-1). If « — (52 ＋ 3)/28 then (3.4.5) gives 
B = — 1/R, which is not possible since 8< k< O, or B — — 4/3, 


whence a= 4/3, 8 = (6 + 44/3 k)/(— 2k) and y = (k --4/3)/(24/3) 
as in (3.4.3). 


For a bound valid for all 5 we have 


u 3— 2 4 ^" 2 
U <= max gm 3 962 -+ 1) 24/3 


3 — k2 4 
3+ 3k? 9 (k? + 1) 
since (k + 4/3)/(24/3) is a bound only for — 4/3 < k < 0 and then 


ee o am 
2/3 33 


= max 


This gi E seri, j or T, 
is gives * 34 3 or —./3*^* 


4 = 5 
U« 3081-1) for k < -4 


By taking G (x) = 0 we have L = 0. 
For k > 0 we can obtain values by symmetry. 


This result was obtained by Mallows (1956) by a method which 
he applied to a number of such problems. The method is to deter- 
mine “extremal distributions” which are such that their distribution 
functions equal other distribution functions satisfying the same 
moment conditions at as few points as possible (or at one more than 
the least possible number), and then to show that among this class 
of distributions is the one giving the required bound. As we shall 
see in Section 3.6, it is possible by Mailows’s method to deal with a 
problem intractable by the method used above. 


3.5 A numerical example 


We now take uj = 0, ug = 1, Tas | x | <2 and have the conditions 
that F (*) > 0 for |x|>1 and f"(x) — for |x|<1. We integrate 
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g (x) and x (x) twice and obtain 
Xp (&) — 2x — 2 for x —2, 
= 42° for |x| < 2, 


= 2x — 2 for2« x 


(choosing constants of integration arbitrarily, except that X; (x) is 
to be continuous) and to find L we require G (x) — X () for 
|x]21 and G(x) > Xp(x) for |x| «1. If G(x) satisfies these 
conditions then so do G (—x) and 1 (G (x) + G (-)), and v is the 
same in all three cases. Hence we may suppose that G (x) is an even 
function of x. We shall then have G (x)= Xi (*) at x = +a, 
x = +1, and possibly at x = O also. If 2< « then G (a) = 2«—2, 
G'(«) — 2 and so 


(2? —1) (2a —1)(« — 2) x? — (G. — 102? — 2a? + 54)) 


G (x) = 4 — 2a (a? — 1)? 
Hence 
G (x) = 2x — 2 for x =a (double root) and x such that 
(2a? — 5a -+ 2)(x + a)? = 2(a? —1)*. (3.5.1) 


In order that (3.5.1) shall have no root greater than 2 we need 
(* + 2)2(2 — 5« + 2) 2 (a? — 1)? 


or a9 — 2a? — 4x + 270 
i.e. a> a, = 3-0861 .... 
G (x) = ix? for x = +1 and for x such that 
(2a — 1)(« — 2) x? + (a? — 6a4 + 8a3 + 2 — 4a) = (3.5.2) 
i.e. (2« — 1) x? + a («? — 4a? + 2) = O. 


In order that (3.5.2) shall have no real root we need 
a? — 442+ 22 0 
i.e. a >a, = 3:8662 .... 
Now y = (6a — 10 — 12a? + 30« — 10)/« (a? —1)?, 
and has stationary values when 
3a9* — 10 — 9 + 5043 — 31a? + 5 = 0. 
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Since this has no roots with «„ >3-5 we have max = () = 
9168 


For the distribution the graph of whose p. d. f. is the x-axis for 
|x| > «4, the line y = 5 (x + j) / a (a? — 1) for — a < x < — 1, the 
line y = (1 — 5/a, (a, + 1)) + x («2 — 5)/a? for —1<x<0, and 
which is symmetrical about the y-axis, we find that 


[roe 


is the same as the value obtained for / (a) above, while all other 
conditions arc satisfied if we vary the distribution slightly so as to 
"round off the corners". Hence L — -9165 .... 


It may be verified that the case O< «-2 leads first to the 
conclusion that « must be 0 or 1 and then to the value just given for L. 


When we consider U we find that the only possible form for 
G (x) is $x? which gives U = 1. Since there is no term in x*, we 
should expect to obtain U — 1 as the limiting case of a distribution 


with long tails of small probability, and in fact if f(x) is symmetrical 
about x = 0, with 


P f(x) = (a4 — 1)(a+ 5)/(a* + 4a + 5) forO<x<1 
an 

f(x) = 20 (a — x)*/(a — 1) (42 + 4a + 5) for I a, 
then 


f fee = 1 —0(a-*) 


and all conditions are satisfied for a> 5 (again with the corners 
rounded off’’). 


The value L = -9165 ... may be compared with the values -75 
from Tchebycheff’s inequality, using no information about the 


derivative of f(x), and -8889 ... from the Gauss-Winkler inequality 
using only the unimodality of f (x). 


3.6 Restrictions on the magnitude of f (x) or F (x) 


In what has gone before we have used the fact that if 
9 = ao + ... + a, p, is a support hyperplane to a convex set 
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defined as the union of half-spaces then we can express 
y — ay — ... — à, p, in the form 


> Pi (ao + ... + 4, * — xx (x2); 


and this leads to critical distributions which are discrete or else to 
continuous distributions in which the p.d.f. has a derivative of some 
order which increases by finite amounts at a finite number of points 
and is constant in between these points. We can regard a discrete 
distribution as the limiting case of one with high peaks“ as the 
height of each peak tends to infinity and its breadth tends to zero; 
and so, if we restrict the magnitude of f(x), such distributions will 
be excluded. 

Instead of expressing  — a, — .. — a, p; as we did above, we 
require an expression for it in the form 


ff GO (ao + ... + anx” — x, (*)) dx; 


and if f(x) « A and f f(x) dx — 1, then f(x) must be non-zero over 
an interval of length at least 1/A; hence  — ay... — an ph will 
now have a non-zero minimum (in the case when we are finding 
L) and we need to estimate this. Instead of finding support hyper- 
planes to a convex set, we should find hyperplanes that ‘‘do not 
intersect too deeply", and we should expect to find that critical 
distributions were those in which f (x) was either 0 or A. In order to 
justify this belief it seems easier to abandon the argument which 
relies on convex sets and to use instead the method of Mallows 
(1956). To illustrate this method (somewhat simplified since we 
are dealing with a specific case) we take the example in which 
f(x) <A, we are given that i = 0, p = 1, and T is the interval 
* R. From a theorem of Achyeser and Krein given in Shohat and 
Tamarkin (1943) (p. 82) we must have A —1/(24/3). (This can be 
seen otherwise, since for A —1/(24/3) the most compact distribution 
is rectangular with range > 2 4/3 and then ug is too large.) If 
k< — 1/(2A) we note that 


F(x) = «(a —|x|)* for 1/22) — |x| a 
fŒ) = «(a — |x|) + A(1— 8) for |x| — 1/22) 
satisfies the moment conditions if 
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and 


and f(x) <A if «ea? < A8. 
Also 
f f(x)dx = 0 (8), 


and for sufficiently small à all conditions will be satisfied for suitable 
« and a, so that L = 0. Hence we suppose in what follows that 
k > —1/(22). 
Let F*(x) be the distribution function when f(x)—A for 
k«x« «and B<x< y (« k), where «, B, y satisfy 
* — R T- B 1/A 
a? — k? + y2 — B= 0 (3.6.1) 
«3 — k3 + y3 — B3 = 3[A 
so that the distribution has the prescribed moments. We need to 
verify that such a distribution exists. 
Since 2 (a? 4- y?) = 3 (a+ y)(a? + 5?) — (a+ y)? we obtain from 
(3.6.1) 
2 (k3 + B? + 3/3) = 3 (k + B+ / + B?) — (k + B+ 1/3)? 
B = — (62? + 3kA + 1)/A(6RA + 3); 


and since 2« = (a+ y)+ 4/(2 (a + y?) —(«+ y?) we have also 
rere (- A i) 
„2M +8- (em -eo-X)H 


Since 
2 
(k — B — 5( -- B — i — 


(6A? k? + 62 — 2)? + (7202 — 6)(2Ak + 1) 
* (6Ak 4-35 — ' 


i.e 


« and y are real. 
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Now 1 2A8 = (1 — 1222)/(6kÀ + 3) O and so 
(8 — k + 1/2? < (k — B} — 2 (k + 5) / A — 1/2? 


and we have « kx and also k > y, as we require. Since 1 -+ 2 > 0 
we have also 


(k — B+ 1/3» (k — B — 2 (k + BA — 1) 
which gives y > B. 

Suppose now that for some other distribution F (x) satisfying the 
same conditions we have F (E= F*(k). Then, since the graph of 
F(x) nowhere slopes more steeply than the graph of F*(x), the 
curve y = F (x) can cut the curve y = F* (x) at only one point, say 


x b, where B< 0< y, or else we shall have F (x) < F* (x) for all x. 
In the first case we obtain a contradiction from 


" >|". (x — 0)(F (x) — F* (x)) dx = 
-[. (x — 6)? (F' (x) — F” (x)) dx = 0 


and in the second case similarly from 
]; e- eo — ropa. 


Hence F (k) > F” (k) = à (y — B) with B,y as given above, and since 
F* (x) is a possible distribution function this gives 


L = AO — B). 


As A tends to infinity we find that g̊ tends to —1/k, « to k, and 
y to —1/k, while A (y ) is asymptotically k/(A — 8) which tends to 
R /(k? + 1) as in Section 2.5. 


If A= (Zr) (the value for the normal distribution with unit 
variance) and k = 2, then we have L = -8777 ..., in contrast to 
L = -8 when we impose no upper bound on f (x). 


Another type of restriction on the distribution function which 
was considered by von Mises (1938) is as follows. Let x be a non- 
negative variable and let the graph of y — F (x) lie below the straight 
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line joining (x,, F (x,)) and (z, 1) for all x 2 xy (where xy < x,) and 
for some value z (—x,)'*. Also let the absolute moment v, be given. 
The condition on the graph of y = F (x) can be written as 


F (x) < H (x) = F (x) + (x — x) — F GH — xı). (3.6.2) 


z (rz 
Then vp > f xt dF (x) > | xt dH (x) 

Ze. 9 Ze 
(because any value of F corresponds to a larger value of x than does 
the same value of H) 


g a — K, 1 — F (xi) 
i.e. vr > — — 
r-+1 2 — x 


— (3.6.3) 

Since the right-hand side of (3.6.3) tends to infinity as z tends to 
infinity or to x, + 0 it must have a minimum value for some value 
C where x, < £, and this is given by 


(r+ Der — xg) = 1 — x) 
or 18 — (r+ 1) x, Ur + g = 0. (3.6.4) 


By differentiating once more with respect to Z, it can be seen that 
the equation (3.6.4) defines ( x,) uniquely. We now have, 
independently of z, 

t — xegtt 12 F (x4) 
t—x- TI 
or, using (3.6.4), 

w > (1 — FC 


i.e. F (xj) >1 — / r. (3.6.5) 
Now from (3.6.2) we have 
0< F (xo) < F (x,) + (xs — x,)1 — F (x) / E — xi) 


and this gives 
z> Z = xo + (x, — & /F (xi). (3.6.6) 


his means that the graph of the distribution function lies beneath one 
of its tangents at all points to the right of some given point. 
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so that if {< Z we can improve on the inequality (3.6.5). 9 


Using the value Z for z in (3.6.3) we then have 
vr > (Zr" — xy" — F Ci / — xr + 1) 
= (Z — x54) /(Z — xor + 1) (using (3.6.6)) 
= (Zr + ZY xo +... / + 1) 2 xz. 
If 7 is defined by vr = (77+! — xg+!)/(r + 1)(7 — x$) then Z < 7 and 
F (x1) = (n — xo)/(Z — xo) > (xı — o — xo). (3.6.7) 


The inequality (3.6.7) holds with equality if the distribution is 
rectangular with range from x, to z; consideration of the inequalities 
which lead to (3.6.5) shows that the same distribution must be used 
to achieve equality there, but comparison of (3.6.7) and (3.6.5) 
shows that the latter inequality is true only for the special value 
% = 7 = Z, 

For the normal distribution with unit variance, taking r = 2, we 
have v, =1, and if x, = 2, we can take x, as any value not less 
than 0. ‘Taking x, as O, its least possible value, we have £ = 3 and 
7 — V/3 so that ¢ — r, and we use (3.6.5) to give F (2) > 1 — 4 (the 
value given by the Gauss-Winkler inequality). If x, — 1 then 
t= r, and we obtain from (3.6.5) F(1)>1 — 4, but from 
(3.6.7) we obtain F (1) > 3-* which is a better value. 


3.7 Mean range given for unimodal symmetrical distribution 


As in the case of Chapter 2 wc end the present chapter with an 
inequality in terms of mean range w from a sample of given size n. 
We suppose that the distribution is symmetrical and unimodal (with 
mode at the origin) and find a lower bound for the case when T' is 
|x|« Aw. Although — in contrast to the earlier example of the use 
of mean range — we obtain an inequality for a single interval, we 
cannot use the method employed in most examples because the 
mean range is expressible only as the expectation of a function of the 
distribution function; instead we use a special argument. 


Let f (Aw) = h; then for Aw < x we have 
F (x) < F (Aw) + h (x —2 w) 
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from the unimodal property of the distribution. Also, for 
0< x< Aw, we have 


F (x) < F (Aw) — h(àw — x). 
Hence, for 0 — x, 


4< F(x) < F, (x) = F (Aw) — h (àw — x). (3.7.1) 


Hence 
w>2 |" = (F 6"— A — Fi)" ax 
where hx, = 1 — F (Aw) + haw, 
i.e. F(x) = 1 for x= x, F,(x)—1 for x< x, 
Hence 
— 1— (F (Aw) — Ae) (1 — F Quo) + hoo)? 
cT (n 43-1) h MN (n+ 1)A 
1 —F (Aw) + haw 1. , (F (àw) — haw) 
* h h(n +1)" (n -1)A 
(1 — F (Aw) + haw)" 
u ( - 
Putting hAw — t and F (Aw) F (—Aw) = 2F (Aw) —1— P, we 
have 
1. 1—P 2 (P+ 1— 22)» 
R 
(1— P+ 27)» 
—(610)0: 9725 
— Q (t), say. 


We minimize Q (t) with respect to ¢ and interpret the resulting 
inequality as an inequality for P in terms of A. 


, 2 
Now £Ọ'(t) = — (1 — P) + T 9m 
(P + 1 — 2) (P+ 1 — 2t)"t 
2^ (n + 1) Qn-1 
4: (1 — P+ 2t)n+1 (1 P 4- 2t)” t 
2^ (n 4- 1) — 7 2n3 OS” 
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and 


FC Q' (t)) = xa + P — 25) — (1— P + 2253) 20 


since, from (3.7.1), we have t< 3P. When t = 0, 
2 (P+ 1) (1— Py 
* = Y = Nr e 
This is zero when p —1 and its derivative with respect to P is 
1 —2-"(P--1)^—2-^(1— P) >0 for 0<P<1, 
so that 2 Q’ (t)< 0 when ¢ = O and increases steadily with t. Now 


2 P 
iP O (4P) = —1+ P+ EIC 
and if this is negative, i.e. if 


n —1 2^-1 
n» --1 '"2n3—]' 


r= 


then Q (t) takes its least value for t = 4P (the greatest permissible 
value of t) and we have 


1. 2(n—1) 

A7) (3.7.3) 
If, however, 

n— 1 2n-1 


then O“) vanishes for just one value of £ in the interval 0 < x < 4P 
and for this value (3.7.2) gives the inequality for P in terms of A. 


To deal with the equations easily put 1—P + 2t = 2y to give 


2 2 
0 = 2t — 2 + ti sri - 241 — y)” 


2 
ZE - d yn — 2ty" 


12 nl n+l 
t apis X 


- — n+1  n+i 
or ii 1] — y^ — (1— y)” (3.7.4) 
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.. VL. sot 
(n+ 1)t (n 4- 1)t (n+ 1)t 
= 2(1 — 9" — (1 — »)*). (3.7.5) 


For a given A, (3.7.5) gives bounds on y from which (3.6.4) gives 
bounds on t and hence on P. From the method of proof there exists 
a distribution for which the bounds are exact. 


If with n = 2 we take the value w = 27-* (which is the correct 
value for the normal distribution with unit variance) and A= 4/7, 
then if P < % we have P > 24/3 which is impossible. Hence 3 < P 
and, from (3.7.5) we have y(1— y)«1/(44/7). Also, since 
2y —1— P+ 2t, we have O« y « 1 and so 0< y< -1699 .... 
Hence, since 1 — P = 2(y — t) = (3y — 4y*)/3(1 — y), we have 
1— P< -1583 ... or P > -8417 .... The bound obtained is thus less 
good than the one given by the Gauss-Winkler inequality, but it 
improves as we use the mean range from larger samples. Winsten, 
who first obtained the above inequality (1946), gives tables to assist 
in the computation of the bound for P. 
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CHAPTER IV 


MULTIVARIATE DISTRIBUTIONS 


4.1 Introduction 

In this chapter we consider the extension of 'Tchebychef’s 
inequality to z-variate distributions (m > 2). As might be expected, 
the increase in the number of dimensions is accompanied by an 
increase in the complexity of the working and of the results, and 
although quite a lot of recent work has been on this aspect of the 
problem, there is as yet not the completeness or generality which we 
found in Chapter II. The regions which will be considered in the 
space of the variables are simpler (being in fact all connected sets, 
in contrast to the example in Section 2.4), the conditions assumed 
are simpler (in most cases involving moments of order no higher 
than the second), yet even when a general method exists it leads to 
an algebraic problem for which no method of solution has yet been 
found. Moreover, fewer geometrical restrictions have been employed 
with multivariate distributions. What does carry over from the 
earlier work, however, is the general idea of studying a function 
whose expectation involves the given expectations and using 
properties of convexity. 


4.2 Second-order moments: rectangular region 

In this section we shall assume that we are given the second- 
order moments of the distribution (with all means equal to zero) and 
that T is the rectangular region | x; | di (i = 1, ..., n). 

By introducing zero probability at infinity we shall have U — 1. 


Without loss of generality we assume that dj —1 (i — 1, ..., n) 
since we can scale the x;; we can also apply a more general 
transformation to the x; to deal with the case of a parallelepipedal 
region, but we shall do this explicitly in only one example (see 
Section 4.6). 


56 


Let the given moments be E (x; xj) = ut and let M denote the 
matrix (ni). 

The function corresponding to that used in Chapter II is 
E (j. , Xn) = ag + xa’ + xAx’, where x is the row vector 
(x3, „ Xn), « is a row vector, and A is a symmetric matrix. 

We require ay + xa’ + xAx’ <1 for all x and 


ag + xa’ + xAx' « O, 


except when |x;| <1 (i = 1. , n). If L O then 0<a,<1 
and A must be negative definite. If g (x, ..., x4) = 1 when some 
Xi, say x,, equals 1, we can add a positive multiple of (x, — 1)? 
to increase a, while if a, — 1 we can add a positive multiple 
of xj to make g (*, ., K) = 1 when x,— 1. Finally we may 
assume that g (x,, ..., x4) is symmetrical in the x; so that « = 0. 
Hence we take g(x,,...,%n) as 1+ xAx’ and we have y — 
1 + tr AM (where “tr”? denotes the trace of the matrix). 

We now have to maximize with respect to A, subject to A 
being negative definite and such that 


x Ax « —1 (4.2.1) 


whenever any one of xi, ., x4 is numerically greater than unity, 
equality obtaining in at least one instance. 

We now find the conditions which .4 satisfies when this 
maximum is attained. 


First suppose that A = r a 


where a is the row vector (aj, ..., din), and that 


b b 
= | 11 . 
A- = B = hs pn ) 


Since bn a + bA = (0, ..., 0) we have 
b = 511 442 (4.2.2) 
Since 611 a + ba’ —1 we have, from (4.2.2), 


zu (044 — aA a^) «x1. (4.2.3) 
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By multiplying the matrices in the reverse order we can interchange 
the a's and b’s in (4.2.2) and (4.2.3) to give 


a = —a,, 5372 (4.2.4) 
and 411 (511 — 532 b^) —1. (4.2.5) 
Writing y for (xs, ..., Xn), we have 
(* A (n, YY = an xt + 2ya'x, + y As y' 
= by! xt + (y — bu! bx) Aj (y — bi! bxy’, 


using (4.2.2) and (4.2.3). Since Ags is negative definite this gives 
xAx’ < bu! xi with equality only for 5,, y = bx,. Hence, using 
(4.2.1), we have bi < — 1 and similarly 545! < — 1, . 5% < — 1, 
with equality in at least one instance. 


If bi <= == 1 let 
20) = ( By)" 


Since B is negative definite, so, for sufficiently small 8, is B(8) and 
so is (B (8) ) 1 = A (8). 


By direct evaluation of (B (8)}-! the element in its first row and 
column, say a (8), is 41/1 ＋ an 8). Using (4.2.4) and (4.2.5), 
we have that 


1 —bB; 0 0 
(pay nive)? % py) 1 7 


whence 
B 1 — bB} ) T m 
A (8) = a, (8) (— Bib B;bbB; = (o 220 


Now, for some @ such that 0< @< 1, we can find arbitrarily small 
8, 5, such that 


and this is the same as 8 a,, (8,) + (1 — 0) a,, (8,) = 411 · 
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Hence 


0 A (81) + (1 — 6) A (&) = A, 


and A is an interior point of the set of possible matrices. Hence 
V =1 + tr AM, which is linear in the elements of A, cannot 
achieve its maximum at A. Hence, for maximum , we must have 


bu = —1 fori —1,...,n. (4.2.6) 


The restriction that B be negative definite means that its 
characteristic roots are negative; the limiting case is when a 
characteristic root of B tends to zero and consequently a character- 
istic root of B-! tends to minus infinity. This would make tr B M 
tend to minus infinity, and hence the maximum of tr B-! M occurs 
at an interior point of the set of matrices B. If we consider the 
matrices B which are linear in some variable f then we have 


d? d (d d dB 
— =f r —1 — 9, (OH 1 R- 
ag (tr B M) — 4% (al r M) 4 tr B- E M) 


(since B S (B-13) + - B- == 0) 


dB dB 
- T ERI muon 
2tr B = B 4 B N 
which is non-positive since B-! is negative definite and M is positive 
definite. Hence the maximum with respect to ¢ of tr BM is 
unique; let it be attained when B — C. 


Consider in particular the case when ż is 5% an element of B off 
the main diagonal. Since for any matrix G, BGB and G have the 
same characteristic roots and hence the same trace, we have that 


dB dB 
tr B- p BOM = tr q B+ MB- = 2 dy, 


where D = B-! MB-!, But at the maximum 


d 
45. (tr B- M) = — tr B ES B-! M = 0, 
* 
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so that C7! MC-! is a diagonal matrix, say with elements y}, ..., yn 
(which are positive, since M is positive definite). Now C~ MC-! C 
has diagonal elements — i..., — yn and therefore tr C-! M = 
— (yı + --- + yn) so that L = 1 — (y, + ... + yn) (unless L is 0). 

L is attained by discrete distributions as follows. If 
Yı t esc yn <1, let zyt be the probability at the point 
(Cu, ---» Cin) and its negative, and let there be probability 


1— (yı + -.. + yn) at the origin. Since cy = —1, the prob- 
abilities 35 can be counted as falling outside 7, and we have 
P(T) = 1 — (.. + yn); we have means zero and also 


E (x’x) = CC- MC- C = M, as required. 


If 1< yı + ... + yn then we take probabilities yil24/ >> yi at 
XV vo (Gu , cin). This gives the correct covariance matrix, 
and at least one coordinate is greater than unity in absolute value. 


We now consider how to solve for B, and hence yp . . „ Yn, for 
a given matrix M. 


If n= 2 then B is y- $ with |8| — 1 for negative 


0102 02 


: =] EE oip 01 \ (—1 — 
in (Z mi oi H E 
to be diagonal, which leads to 

010 PP? + (ef + o3) B + o0, p = 0. 


The product of the roots of this quadratic in B is 1, and the sum has 
sign opposite to that of p. Hence 


B = (— (of + 08) + (ot + 03)? — 4ot o3 p?))/20; v, p. 


2 
e . e o 010 
definiteness, and, if M is ( i 3 "a 


tr B- M = — C + 2Bo, o + e$)/(1 — B», 
and this is 

— à (of + oi + ot + eB) — 4o? oi 6e). 
Hence L is 


1 — $ (ei + o$ + V((o? + 03)? — 40? o p?)). 
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his is equivalent to a result due to Lal (1955); the special case 
cı = o was dealt with earlier by Berge (1937). 


For larger values of z no general method (other than solving by 
successive approximation the equations which arise) is known for 
finding B and A, and this presents one of the most interesting 
unsolved problems of the subject. 


The theory above is due to Olkin and Pratt (1958); the same 
problem was solved in similar terms by Whittle (1958b), inde- 
pendently but slightly later. 


An explicit result using only some covariances was given by 
Birnbaum and Marshall (1961). It is assumed that not more than 
(n —1) values of p; are known, including, for given j, not more 
than one value with i <j. The result is not, in general, best possible. 


4.3 Second-order moments: region similar to an orthant 

If in the last example we replace |x| < d; by x < di, then T is 
similar to an orthant (i.e. the generalization to n dimensions of a 
quadrant in two dimensions and an octant in three). 


As before, we take d; —1 (i — 1, ..., n) and to find L we consider 
E = ao + xe’ + xAx', where now 


4g + xx + xAx’ <1 for all x 
and ay + xa” + xAx' <0 except when x; < 1 (i = 1, ..., n). 


We can again take A to be negative definite, but we can no longer 
appeal to symmetry to make a = (0,...,0) nor a, = 1. However, 
if all the coefficients of correlation between the x’s are equal and all 
variances are equal, we can use symmetry in a different way to 
ensure that all the components of « are equal, and write 
ao + xa’ + xAx' as (x — ke) A (x — key, where e is D. den e 
(To do this we take (n!)-! g over all permutations of Xj, 2. K-) 


The theory developed in Section 4.2 still applies, and we can 
show that Ai must be negative definite with diagonal terms all — 1. 
This was done by Marshall and Olkin (19602), and their result is 
(in our terminology) 
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L = max (0, 


1 n (A/[(1 + (n — 1)p)(1 + o? — . (n — 1Y1 — p))] + (n — DV — er) 
a (n + œ (1 + (n — 1)p)}¥* 


where o? = E (ur), po = E(x;x) (iJ). 


The case of unequal covariances has not been considered, and 
although particular cases could be dealt with numerically by 
maximizing / = E (g), it seems likely that a general expression 
would be complicated. 


4.4 A continuous set of variables 


The ideas of the previous section have been generalized to the 
case of a continuous set of variables by Whittle (19582). 


We let E (x (s) x (t)) = v (s, t), where 0 « s, t < 1, E (x(t)) = O, 
v (t,t) = 4? (t), and 


(Sei. - vo. 


and we suppose that 


E (x (t?) is finite for 0< t< 1 (4.4.1) 
and that xt) exists and 

E (x'(t)?) is finite for 0 — t< 1. (4.4.2) 
Now consider the functional 
S (x,y) 
_ x (0) y (0) + x (1) y (1) 


L f ^ (a 
3 + asa |, Px (0 O +x (Dy () at, 
and let P be the probability that | x (7) | a in (0,1). 


We need to be able to invert the order of the limit operations 
involved in taking expectations, integrating over t, and differentia- 
ting x with respect to t; conditions (4.4.1) and (4.4.2) are sufficient 


for this. (See, e.g., Loéve (1955), Section 7.2, for the relevant 
theory.) 
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We then have 
E (s(x,x)) = Pur $* (1) 4 zal, (029? (t) + Y? () dt 


«90 E90) , L (f'g 4. (voa). 


If y, (t) a it may be verified, by integration by parts, 
that S (x, y, (t) = x (s), while S (y, (t), y, (t)) = . 


Using Schwarz's inequality, we have 
g q y 


S (x(t), x (2)) 
and so E (S (x, x)) > 1 — P, whence we obtain a bound for P. 


Whittle discusses the conditions on the behaviour of x to ensure 
the existence of a function such as y, (t) above. 


4.5 Variances given: rectangular regions 


As another variation on the problem in Section 4.2 we may 
suppose that covariances are not given, so that g is now of the form 
ao + xa’ + xAx’ with A diagonal. As before, we can take 
« = (0, . . ., 0) and a, = 1, so that g becomes 1 E an x}. Every 
aq, must be less than or equal to — 1, and the maximum of g is 
obviously attained by making the a all —1, giving 

L = max (0, 1— > ci). (4.5.1) 

If $, o? — 1 then we can obtain this value of L with the discrete 
distribution for which Pr (0, .., 0) —1 — Dot, Pr (K 1,0, , 0) 
= $ ei, etc. It will be noticed that in this distribution we have all 
covariances zero, but that the distributions of x,,..., x4, are not 
independent. If the distributions were known to be independent 
then the repeated application of 'Tchebychef's inequality in one 
dimension would give the stronger bound L = (1— o?)... (1— o2). 


In the same way that the method for a finite rectangular region 
was extended in Section 4.3 to a region similar to an orthant, so the 
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inequality (4.5.1) can be extended to the same region to give 04 


o? 
L = max (0,1 -5 rfa) 


this was done by Marshall and Olkin (1960a). 


4.6 Bivariate distribution: convex polygonal region 


As an illustration of what can be achieved by transforming 
variables, we show how to obtain L when T is the plane convex 
polygonal region, symmetrical about the origin, formed by the 
intersection of m strips derived by rotation of |x| di through 
angles ct (i = 1, ..., m). We suppose that means are zero and that 
variances and covariances are given. It will be noted that if in 
Section 4.2 we take n — 2 then we have the special case of the present 
problem in which m = 2, a, = O, and a, = i7. 


Hence we know that if T is |x,| — 1, |x| 1 and 
^a (Fd Ex, Xs 01 901 93 
* (s * Eæg ) u X o3 ) "on 


then L is max [O, 1 — 1 (o£ + o2 + A/ ((o? + 03} — 4p? o? 03})], and 
that we find this value for L by taking for g (xi, x5) the expression 
1 + (xi, x2) A (xi, X) where A- is 


— p 
Ce i) 
and £= ( (of + o3) + (e + 03)? — 4p? of o8}} [2 po, oa 


In this case g = O is an ellipse which touches the lines x, = +-1, 
Xg = - 1. 


If we now have the region 
| 4n * + Cig Xa | < di, |C21 X1 + Cog Xo | < ds, 


then we first put 


(Yi Y2) = (xi, x2) (m "s 


or y = xC, and g becomes 1 + xCAC'x' with 
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and f derived from the values, not in M, but in CMC. 


If the ellipse g = 0 lies inside all the other strips then the value 
of L obtained is the correct one for the polygon; if this does not 
happen then we must consider ellipses which lie inside the hexagons 
formed by the intersection of three strips. Again we can take a 
particular case and obtain the general one by transformation, and 
we therefore suppose that the hexagon is defined by 


Jay] <1, [xs] <1, |x, cos æ + x, sin «| <d. 


By changing the sign of x, if necessary we may suppose that 
0 < « < $ z, and in order that the region be strictly a hexagon, we 
need (d? — 1)? < sin? 2æ. Let (d? — 1) = y sin 2«. 

The ellipse O (xi, x,) = x? + x2 — 2yx,x4, — 1+ y2? = O is 
inscribed to the hexagon and touches the sides at A (1, y), B (y, 1) 
and C ((»sin æ + cos «)/d, (y cos «+ sin ) respectively and 
also at —A, —B and —C (the reflections of A, B and C in the 
origin. Outside the hexagon we have O — 0 and everywhere 
Q > y? —1, so that if P is the probability of (x,, x) being inside the 
hexagon we have 


E (Q) = of + of — 2ypa, o, — 1 + y?» P (y* — 1) 
or P> Py = 1 — (of + 03 2501 2, y)/(1 — 7°). 
We now show that this bound (if P, > 0) is sharp. 


Consider the discrete distribution with probabilities $ pı at A 
and —A, $ pa at B and — B, $ p, at C and C, and 1 — ^5, — 5, — 5, 
at the origin, where 


fi m 

of (y cos « + sin ) + o$(y*sin æ + y cos x) — PT: 0, (y* cosa + 2ysin « + cosa) 
sin « (1 — y?)? 
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of (y* cosa + ysin«) + o3(ysina + cosa) — p92; sin a + 2ycosa + sin a) 
cos & (1 — y?)* 


— (— voi — yo$ + po, oa (1 + y?)) (1 + y sin 2a) 


sin « cos « (1 — y?)? 
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Then all moments are correct and 1 — p, — p; — p, = Po. 


Now since we are considering a hexagon and not merely a 
parallelogram, the ellipses obtained above for the parallelograms 
formed by any two of the strips must each intersect the sides of the 
third strip which determines the hexagon; in particular the ellipse 
— 1 = (xt, x2) A (x, x2)’ intersects the line d = x, cos a + xsin æ 
if (B + y) sin «cos «<0, so that 


Ps Pn (y + B)(y + B)/sin « cos « (1 — y?} > 0; 


similarly p, > O, 5, > 0 and the distribution we have considered is 
an actual one. 


If Po < 0 we can decrease c, and o, until P = O and construct 
a discrete distribution with probabilities on the boundary of the 
hexagon and none inside. Then restoring o, and o, to their original 
values has the effect of moving these probabilities away from the 
origin and so giving zero probability in T. Thus we have proved that 


L — max (0, P,). 


For a general polygon we have to consider first all possible 
parallelograms and then, if these fail, all possible hexagons, selecting 
the least bound given. 


This problem was originally discussed by Marshall and Olkin 
(1960c); they showed in addition that if the widths of the strips are 
equal the only hexagons to be considered are those formed by 
adjacent strips, in the sense that no other strip defining the polygon 
is obtained by a rotation of the first strip through an angle less than 
those defining the hexagon. 


4.7 A general theorem (first- and second-order moments given) 


In this section we give a theorem, due to Marshall and Olkin 
(1960b), which reduces the problem of finding bounds on prob- 
ability in certain problems to the minimization of a quadratic form 
under given conditions. As in the problem studied in Section 4.2 


the solution may not be explicit, but we give one application where 
it is. 
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We suppose that T is a closed convex set and T* the union of 
T and — T, where — T is the image of T in the origin (means are 
chosen to be zero). Let the variance-covariance matrix E (x'x) be 
denoted by M. 

If A is the set of all vectors a such that ax’ > 1 for all x in T, then 
for T we have 


g aMa^' 
U = inf T+ alfa (4.7.1) 
while for T* 
U = inf aMa’. (4.7.2) 


a in 4 


To prove (4.7.1) we consider g (x) = (ax’ + aMa’)?/(1 + aMa’)?. 
We have g (x) > 1 for x in T and so obtain 


aMa' (1 + aMa^) = E (g (x))> Pr (x in T). (4.7.3) 


Now let q = q (a) = aMa', Q — / + q), w= aM/q. Using 
Cauchy’s inequality™, we have xMx’ wM- w > (xw’)?, whence 
xMx' > q(xw^)* > O(xw’)? = Qxw'wx'. Hence x(M — Qw'w)x' > O, 
and equality can obtain only if Q = gor xw’ = O, which means 
that 9 — O or xw = 0; therefore equality obtains only when 
xMx' — 0. Since M is the matrix of a positive definite form so is 
M — Qw'w. Hence there is a non-singular matrix N such that 
M — Qw'w = NN. Since xMx’ is positive definite we shall have 
inf a Ma taken at some finite point, say 4g. We now show that 
Wo = (ag) belongs to T. 

If this is not so then, since T is convex and closed, there is a 
hyperplane separating v, and 7'; i.e. there is a vector v and a constant 
« such that vw’ < « but vx’ > a if x is in T. Since Got, = 1, we have 
(v + (1 — 2) ay) wg < 1 and also (v + (1 — x) ao) l for x in T, 
and we may therefore suppose that « — 1 by taking v + (1 — q) a, in 
place of v. Now «vw, p = vMa'[q and so vMa, < ay Maj, i.e. 
4, Mv’ — a, May. Hence for a sufficiently small positive e, we shall 
have (ve — 2a, Mv’ + a, Mai) < 2 (a Ma; — ay Mv’) i.e. 


uMu’ < a, Mag (4.7.4) 


For this form of the inequality see, e.g., Beckenbach and Bellman (1961), 
p. 69. 
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where u = ev + (1— a. However, if x belongs to T we have 
ux’ > e + (1— ) —1 and so (4.7.4) contradicts the definition of ag. 
Hence wọ is in T. 

We now show that the bound obtained for probability in (4.7.3) 
is sharp, so that we have the actual value of U. Q, w, etc. are all to be 
evaluated at a, and we shall drop suffixes. 

Let D = diag (pi, ..., p}) (i.e. the square matrix with pi, ..., pè 
on the diagonal and zero elsewhere) where eD = — OwN- and 
e = (1, ..., 1), and let C= DAN. We have ^p; = eD? e = 
Q? wN-! N-Vgv' = Q?w (M — Q«w'w)-! w. Now since qw' = Ma’ 
and wa’ = 1, we have w (M — Ow’w)-! (M — Qw’w) a’ = 1, whence 
vw(M — Oro ce) w' = 1/(g — 0) = (1 — / O and so py = 
1— O. Let C be the ith row of C, and consider the discrete 
distribution with probabilities p; at C® and O at w. This distri- 
bution has mean wQ + eD? C = wQ + eDN = wQ — Qw = O and 
variance-covariance matrix Qww + C'D? C = Qww -+ N'N = M 
and so satisfies all the given conditions. Since the probability at w 
(which belongs to T) is Q, we have U > O, whereas from (4.7.3) 
we have U< Q. Hence U = Q. 


We note that since N is not unique the distribution found to 
give the value for U is not unique. 


For the region T* the argument is similar; we take g (x) = (ax )2 
(and so do not use the fact that E (x) = 0) and we find that M — qww 
is positive semi-definite, so that now N is singular. We choose 
positive pi. ., pn to satisfy p, .. + Pn = 1— 9 and put 
C = DAN where D = diag (pł, , pł). Finally we split the 
probabilities p, , pn, q equally between C® and — C or w 
and — w. 

As an application we take T to be the region in which every 
component of w is not less than 1; thus T is the region in which all 
components have the same sign and modulus not less than 1. We 
look for the minimum of aMa’ subject to a> 0, ae’ > 1; we must 
have a> 0 since if a,, say, were negative we could find a point 
(X, 1, . ., 1) in T which gave ax’ — 1 for sufficiently large X. If the 
minimum occurs at an a with non-vanishing components b then we 
can take be’ — 1 (here e is a row of 1’s, as many as there are elements 
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in 5); differentiation of 5M,b’ + A(be’ — 1) (where M, is the 
principal submatrix of M corresponding to the columns in 5) gives 
2M, b’ + Ae' = 0. Since be’ = 1 we have 2bM, b’ + à= O and 


so b’ = —4A Me’ = M;! e' (bM, b’). 
Hence bM, b’ = (6M, bY eM?! M, MF e 
i.e. bM, b' = l1/(eM;! e^) 

and so b' = M3! € eM e^). 


Hence min Ma“ is given by 1/maxeM;!e' where My e > 0. 
This maximum always exists since a possible M, is given by any 
diagonal term of M; it will be noticed that finding the minimum 
involves only the consideration of a finite number of submatrices, 
and not, as in the case of the problem discussed in Section 4.2, the 
solution of sets of equations. 


Marshall and Olkin also apply the general theorem to thc case 
when T is x, ..., Xn > 1, x > 0, but then the solution is not explicit 
as was the one which we have just obtained, 


4.8 Elliptical region: independent variables 


In the above results we obtain equality only when the distri- 
bution is concentrated on a quadric or at its centre; if we stipulate 
that the variables should be independently distributed then we 
exclude these cases. 


Suppose that T is x?/s? +- x2/sd —1 and the variances of * XQ are 
oi, og respectively where o?/s?< o3/s3. The method which we have 
used before to obtain inequalities breaks down here since when we 
write down the expectation of x, x, we use only the fact that x, and 
* are uncorrelated and not the stronger condition that they are 
independent. Birnbaum, Raymond and Zuckerman (1947) solved 
the problem by approximating to a distribution by a discrete distri- 
bution with probability at many points and then reducing the 
number of points to at most four. We state the result in terms of 
non-negative variables (which may be taken to be x?/s? and K 2763 
for the present problem) as follows. Let x and y be non-negative 
with E (x) = A, E (y) = p and à< y. 
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Then Pr (x + y 7 t) « M(t) where 
M (t) = 1 ift<A+ p, 
= pft M if A-- i 2e V A. 
= (A + t Mi if & (A+ 2g + /Q8 4%) < t. 
The inequality is realized with equality for certain discrete 
distributions. 


We first suppose that x,, x2, x3 are three values of x (now assumed 
to have a discrete distribution) at which the probabilities are 
Pi Po Ps respectively. We shall replace these probabilities by 
Pi Py» Pa such that pjp;p;— 0, 


Pit Pa + PA = fit Pa ＋ Ps (4.8.1) 
Hy Pi + Xa PA + X3 py = xi PI + Xa Po + Xs Ps, (4.8.2) 


and, for the new distribution, P = Pr (x> u) is not less than its 
value with the original distribution. 


From (4.8.1) and (4.8.2) we have 


(* — xə) 


(x2 — x1) 
(xs — x1) 


(* — x) 
If u does not lie in the interval x, < u < x, then we have not changed 


P; if xı < u< x, then we can increase P by decreasing pi and this 
we do by taking 


i= PI + (P2 — Ps) PS = Pa + (Pe — Ps) 


Xa — x Xg — x 

7 : 3 1 3 1 
Pa — P. = min [ pı ———— — 
s 3 ( " * * - * =), 


while if x, < u < x3 we increase p by taking 5; = 0. 


Now suppose that, starting from discrete distributions which 
approximate as closely as we wish to the actual distributions of x and 
y, we have successively reduced the number of points with non-zero 
probability until we have, for x, p at « and 1— p at f (æ < f) while, 
for y, we have g at y and 1— q at 8 (y — ô), where 


à = pa + (1— p) B, = gu f (1 — 9) ô. (4.8.3) 
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Then, in the combined distribution, we have probabilities pq at 
A (a, y), p(1 — q) at B (a, 8), (1 — p) g at C (B, y) and (1 — p)(1 — q) 
at D (B, 8). 


If ¿< A+ p then the example p = q = O shows that M (t) 21 
and so M (t) = 1; suppose in what follows that f à+ u. If £ — B 
we can, without altering E (x), replace the distribution of x by p at 
* = a + (1— pp — % and 1— p at t; note that £ — a’ = 
(t — A)/p > 0 so that Pr (x 2 t) is unaltered. Hence we may suppose 
that 8 < t and similarly 8 < f. 


If, at A, x + y <t while at B, C, D, x + y > t then 
P = 1 — pq = 1 — (8— A8 — )/(B = y) (from (4.8.3) 
« 1— (B— X(8 — w) (B+ è — 5) 

(since «a+ 82» t, B+ y 2t). The right-hand side, regarded as a 
function of B, has a single minimum and so its greatest value is 
max (1 — (t — p —23)/(8 — p), 1— (t— X(8 — 0/89 

since £ — u «t — y« B« t. 

Now 1— (t — p— %s — p) Met — p) < pj(t — à) since 
ô< t, and 1— (t — ) (8 — ) / 82, which, as a function of 5, has a 
single minimum, is not more than 

max (1 — (t — A /f — . 1— (t — At — 90. 
(since t — At- a< ô< t), 
= max (u/(t — A), (e+ 2)/t — paA/t?). 
If A and B lie on the same side of x + y = t as the origin and 
C, D lie on the opposite side, then 
P=1— p=1— (B — XIB — «) 
= (à — a)/(B — a) < (à — a)/(t — u — a) 
S — u) < u — 2). 

If A and C lie on the same side of x -+ y = t as the origin, and 

B and D lie on the opposite side, then 


P-1—94-—1—(5— 5% 6 — y) 
= (r — »K8$—»«(—»Kt—23—»«e»K(t — à). 
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If A, B and C lie on the same side of x + y = t as the origin and 
D lies on the opposite side, then 


P = (1 — p — q) = (à — «Y» — »)(8 — Ks — y) 
and if 8 + 5 = ť then 
P< Ap/BS = Aug (t — B) « max (uJ(t' — A), A/(t’ — u)). 
since A « B « t' — y, i.e. 
P< pit — 3) < i(t — X). 
Hence for A +- p < t we have 
P < max (/ — A), (A + m/t — Au/t?) 
which is p/(t— A) if t&§(à + 2p + (3 --44?) and is 
(A + t — Api if t> (A + 24 + VQ? + 44?)). 
These bounds can be attained; e.g. «=A, B— t, y — O, 
8 = t — A give P= u — A), while «= y = 0, B= 8 = t give 
P = (A+ p)/t — Ap/é*. 


Without the hypothesis of independence the best result we can 
obtain is P< (A+ ) /t. 


Although it would be possible in principle to use the above 
method for the sum of more than two variables it is clear that there 
would be many more cases to consider and the working would be 
long. The principle of reduction can be employed when more than 
one expectation is given, but working with more than three values 
at a time and so ending with more than two with non-zero prob- 
abilities; this was done by Hoeffding (1955). The particular case 
when the x’s have the same distribution was discussed by these 
means by Hoeffding and Shrikhande (1955). 


4.9 A monotonicity condition: ellipsoidal region 


In the previous sections of this chapter we have used only 
moments of the distributions. A restriction on the shape of the 
distribution was introduced by Leser (1942). 


We take T to be the region (x- / At o4)? < n. 


Let n Ag? = 3 A; n ag? = 2; oi?, R? = (A8 / n) + > (xt / At ot)? 
(so that T is R< Aq). 
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We let A (Ro) be the mean value of the p.d.f. on the ellipsoid 
R= Rọ and suppose that A(R) is a non-increasing function of 
R for R « K. We thus have a condition analogous to that imposed 
by Narumi (see Exercise 12). 


If the surface content of the ellipsoid (A3/m) L (x: / At o = R? 
is C Rui then we have 


1 = [7 cgo A ak, -[ CR A (R) dR, 
0 0 


A. 
Sas Il CR- A (R) aR. 
0 


Put 4 = CA(A,)/n and Ky = {à} + (1— P/u /n. 


E Eo, 
PRI (4.9.1) 


then since K > A, we have 
A. oo 
1 = f CR A(R) dR + f CR» A (R) dR = I, + H say. 
0 Ào 


Ào 

Further, we have J, > f CA (ào) RUM dR. 
0 

Also 


oo K, 
f CA (R) RMdR—1— P= f CA (ào) R”-1 dR 
e Ào 


or Ji - CA (R) R™ dR = f i CR*-1 (A (ào) — A (R)) dR. (4.9.2) 


The integrands on both sides are non-negative, and the values of 
R in the left-hand integrand are larger than the values of R in the 


right-hand integrand. Hence, if we multiply both integrands by 
R? we obtain 


Í - CA (R) R^" dR > f : CR™ (A (ào) — A (R)) dR 


K, 
or I> CA Qo f R dR 
Ae 
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K, 
and so 1 >Í CA (ào) R. dR = CA (X, K$**/(# + 2), 
0 
whence PS IMA — u((n + 2)/nu)n/*, (4.9.3) 
Also from K > A, we have 
m 
P f CR- A (Ay) dR = Mu. (4.9.4) 
0 
If A, < K < Ko then 
P <1 + (A K”); (4.9.5) 
(4.9.4) still holds, but the integrand on the right-hand side of (4.9.2) 


is no longer necessarily non-negative. 


We now write 
L CA (R) R" dR = n (1— P) (f. Re ar) | (Ke — X) 
= CA (ANKE — xy (f^. Re aR) / ae — x) 
* . Rui AR CA (A) + ( f . Ra dR) 
(Kö — K”) CA O /K — N). 
Hence f - CA (R) R dR = f : CR- (A (à) — A(R)) dR 


4- ( * Rna dR) CA (Ay) (KR KH, — X). 


Ae 


We multiply the integrands by R?, R?, K? respectively and obtain 
E 
n f C (Ag) Re AR + 


K 
( A Rr dR) CA (ào) (Kò — K”) K?/(K" — 2$) 
whence ° 


K 
1> f CA (ào) R^ dR + 
0 


( "ges dR) CA Qu) (K3 — K”) K*(K» — 2), 


ro 
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1 >K"+2 CA (A,)/(n + 2) + K? CA (A(K3 — K”)jn, 


whence 
P>1— K- + (A3 — 2K"/(n + 2))u. (4.9.6) 
Finally 


II > 0, 1,2 N F CR?! A (R) dR = X} (1 — P), 
ro 
so that P>1W— , but this bound is improved on by the 
inequalities above whenever they are applicable. 


For A, « K we now have to find the minimum, as u varies, of 
the lower bound for P, using the sets of inequalities (a) (4.9.1), 
(4.9.3) and (4.9.4) or (b) (4.9.4), (4.9.5) and (4.9.6). 

If K « 1, (b) with u = 0 give P = 0, and (a) can give no less. 


If Aj > 2K"/(n + 2) then the least bound from (b) is 1 — K-? 
for wu — 0, and since the bound in (4.9.3) has minimum 


1— (2/(n + 2)?/^ 33 for u= (" t =) CL) Ag"? and 
this is not less than 1— K-?, we have P > 1— XK. 
Now suppose that g < 2K"/(nm + 2). 
If 1 « K « 4/((n + 2)/n} then the lower bound from (b) is 
A$(n--2)(1— K-*)/2K" for u = (n + 2) (1— K-?)/2K*^. 


If 4/ ((n + 2)/n) « K then this bound is inadmissible on account 
of (4.9.5), and we have instead 


1 + (n + 2)(A§ — K")/nK"*? for u = (n + 2)/n K... 


Also if 1 < K< vy {(m + 2)/n} the lower bound from (a) is (Ag/K)" 
for u = K-^, while if Vn + 2)/n}< K the lower bound from 
(a) is 1 — (2/(n + 2) / X? if 


ào > (2/(n + 2))/^ ((n + 2)/n} 
and Ag (n/(n + 2))"? for u = {n(n + 2))* if 


ào « (2/(n + 2)" {(n + 2) N. 
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and 


and 


and 


Combining these results, we have as the lower bound for P: 


0 SEAQuE Kel 
bd NX 


1— K- if 2K"/(n + 2) « A3 « K^. 


AB(n4-2)0 — K-)2K^ if i K VVA 2)/n) 


Ay < 21/n K (n + 2)-1/n, 


n^? A (n + 2)" if VIC + 2/n) c K 


"m 2 Y^ (n+ 21i 

OR) Cx) 

2 \2/n n-+2 
CN i) <x 


2 y n YYY / 2 ym 
(49 C3 CE" 
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CHAPTER V 


SUMS OF VARIABLES 


5.1 Introduction 

In this chapter we consider not single variables but the sums of a 
number of variables, not necessarily all having the same distribution. 
In a sense we are dealing with a special case of multivariate distri- 
butions, since the joint p.d.f. of a sample x,, ..., Xn can be regarded 
as a multivariate p.d.f. for the point (xi, ..., Xn), and restrictions on 
the sum x, + ... + Xn (or on partial sums) define regions in the 
sample space. The sum, however, has a particular interest owing to 
the Central Limit Theorem which states that under certain condi- 
tions the distribution of a sum tends to normality as the number of 
variables summed tends to infinity, and for that reason results 
obtained in this connection have been collected in a separate 
chapter (see Section 5.4 for a more detailed reference to the Central 
Limit Theorem). As in previous chapters we concentrate on 
methods which yield definite numerical bounds and usually ignore 
results which contain undetermined constants. We assume the x's 
to be independently distributed unless the contrary is stated. 


5.2 Population and sample variances given 


The inequality which follows, due to Guttman (1948b), is 


unusual in that both the population variance and the sample variance 
are used in it. 


If the average of x,,..., x, is x and the maximum likelihood 
(biased) estimate of the sample variance is 2 = X (x; — x)?/n then 
we have 


E ((x — ui = un, E (& — H = (u4/n3) + 3 (n — 1) 42/8, 
E (s*) = (n — Dun, E (s$) = (n — 1) uan? + 
(n? — 2n + 3Y(n — 1) p3/n3, 
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and 


E (Œ — Qs) = (n — 1)raln? + (n 3) — 408. 


Hence if we put u = - p;i)? — s*/(m —1) — cha then we have 
E (u?) = pł (c? + 2/n (n — 1). Hence, from Markov’s inequality 
(see Exercise 3) we have 


r [E « E + 2/2 1— x3, 
1.€. 


rr n) <7 2 ara wal (say 9)] 
> 1— *. (5.2.1) 


(We have now possibly increased the probability by including the 
range 


0 « (x — ui)? < s*/(n — 1) + ep, — Apa V -+ 2/n(n — 1)) 
if this exists.) 


To minimize the bound for (x — ) in (5.2.1) we take c< O and 
c? = 2/n (n —1) (à? —1) to — 


Pr {@ * * Go) }> 1— x4. 
(5.2.2) 


A straightforward application of Markov's inequality to (x — pí) 
would give 
Pr {|x — mil <AVW(H2/n)} > 1— A? (5.2.3) 


so that we have replaced the bound A*4,/n by 


SEEN ACIE 


If pa —1, s —1, n = 20, then (5.2.3) gives 


Pr ((x — 0 <1} > 2 — 95, 
while (5.2.2) gives 
3240 


Pr (x — 4*1) > = -9942 .. 


3259 
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Symmetrical unimodal bounded distribution 

In the case when x; has a symmetrical unimodal bounded distri- 
bution (not necessarily the same for each 7) we can obtain an 
inequality by showing that the rectangular distribution is the most 
extreme case. We suppose that f (x+) is zero for | x | >a, and let y, be 
the sum of k variables distributed in the rectangular distribution 
with mean zero and range 2. We now prove, by induction on m, 
that 


Pr (| >> x1| > ba) < Pr(|ya| >b} for all 5 0. 


Let F (t), G (t) be respectively the distribution functions of x4/a 
and of the rectangular distribution with mean zero and range 2; 
since the p.d.f. of x is non-increasing in the interval 0 < x, — a the 
graph of F(t) is concave downwards for 0< t< 1, and since it 
passes through the points (0,3) and (1,1) on the graph of G (t) 
(which is a straight line between these points) we have 

F (t) G ( forO0cz«1. 

Hence Pr (x; > ba) = 1 — F (b) < 1 — G (b) = Pr (y > b). 

The reasoning for negative values of the variables is similar, and 
this establishes the truth of the proposition for n = 


Now let Fn be the distribution function of (x, . + xn)/a, 
F that of x444/a, Gn that of yn, and G that of (nia „n). 
For b — 0 we now have 


" Pr (. ba) + Pr (Yas > b) 
-[ (Fn (b — s) dF = Gn (b — s) dG (s)) 
r. (dF (9) — 460) 
+ [7 (Fa (b — 5) — Gu (b — 8)) dG (s) 


-[^.ee-23-66- , c + 
+ F (Fn (b — s) — Gn (b — s)) dG (s), (5.3.1) 
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(on integrating by parts in the first integral and then replacing 
s by b — s). 


We can now write the first integral as 
[N (F (s) — G (9) dFn (b — $) + 
0 
sf f (F(—s)— G(—s))dFn(6+8), 6.3.2) 
0 


by dividing the interval of integration into ranges — oo to b and 
b to co and replacing s by b — s or b + s respectively. Now because 
of the symmetry of the distributions of x4,, and of (Yn+ı — yn) we 
have F(— s)=1— F(s) and C(—5s)—1— G(s), so that the 
expression in (5.3.2) becomes 


f , FO) — GO) (dFn (b — 8) — Fn + . 
Now F(s) — G (s) O for 0 < s, while 
dF, (b — s) — dF, (b + 5) O if OSB, 


since the derivative of F, (s) is non-increasing for positive s, and if 
b< s then 


dF, (b — s) — MF. (b -} s) = dF, (s — b) — dF, (s 4 5) 
by the symmetry of dF, (s), so that again 
dF, (b — s) — dF, (b + s)>0. 
Hence the first integral in (5.3.1) is non-negative and so, similarly, 
is the second. This completes the proof by induction, which was 
given, in a more general form, by Birnbaum (1948). The need for 


the monotonicity condition is shown in Exercise 18. 
Now Pr (| ys | > b) is 


5 + (— 1)* ^C, {5 1 du (5.3.3) 


n - 
g ( A Sn 


(see, e. g., Kendall and Stuart (1958, 1963), p. 257) and so 
Pr ( Xi 


1 
is not greater than the expression in (5.3.3). 


= ba) 
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5.4 Comparison with normal distribution 

The sutn of a number of random variables tends, as the number 
tends to infinity, to have the normal distribution under certain 
conditions (see, e.g., Loéve (1955), Chapter VI). It is thus possible 
to approximate to the probability that the sum should fall in a certain 
interval by means of the known probability that a normal variable 
should do so. A bound for the difference between the distribution 
functions seems to have been given first by Liapounoff in a form 
involving an unknown constant; later writers have given numerical 
values for the constant and so enabled a definite numerical result to 
be obtained by use of the theorem. Since the analysis which leads 
to the results is too long to be reproduced here we merely give 
results and references. 


Cramér (1923) and (1928) proved that if the x; are independent 
with E (xi) = O, E(x?) = of, E (las) = 7? (i= 1. „n) and 
s = of +... + , ty = H 4 72, then the distribution 
function of x = > xs, satisfies 


1 s r 
Fn (x) — V (27) . 4 <= 


Using this result Offord (1945) showed that the probability of 
X, + ... + xp lying in an interval of length 2A is not greater than 


341 
ä form 1. (5.4.1) 


n 


6 log n kA 
"Pen (log n -+ — ) (5.4.2) 


min o, 
where ci] 7i > 2 Ks, 
Moreover n, k, min e, can be replaced in (5.4.2) by the corre- 


sponding quantities for a subsequence of at least two terms from 
Xj, „ X». (But the inequality still refers to the sum of all the x's.) 


Bergström (1949) replaced the factor 3 log n in (5.4.1) by 4-8 and 
also gave a result for the case when the x's are not independent. 


Instead of using the sum of the third moments, Berry (1941) 
replaced the right-hand side of (5.4.1) by 


. 3 
1:88 nax =. 
Sn i Gi 
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This result is better than Bergstróm's when the distributions of the 
xı are identical, but not necessarily so if they are different. Berry’s 
proof contains errors (see Hsu (1945), p. 3), and the correctness of 
the constant 1-88 has been disputed. The value 2-031 has been given 
by Takano (1950). In the case when the distributions of the x; are 
the same, Ikeda (1959) has obtained better values of the constant by 
requiring that 7,/s? shall not be too small. 


To end the section we show that the removal of the log term in 
(5.4.1) has produced a right-hand side of the correct order; further 
improvement must lie in the direction of improving the constant. 
Let the x, have the binomial distribution with probabilities } at 
x = + 1, so that e; = tr; = 1, sn = y/n, and t. = n. For even n the 
probability that $, x; = O is Cz. 2 * which is asymptotically 
(Zan) by using Stirling's formula for the factorials. Consequently 
at points near the origin on either side 


Fae) N |" e les dt | 


must be about 4 (27m)-#, so that the term n~} is of the correct order 
and the constants 1-88, etc. cannot be less than 3(27)-4 = -199 .... 


5.5 Restrictions on all moments 

A result under the condition that the rate of growth of moments 
be not too large was given by Bernstein (1924). Suppose that for the 
distribution of each x; there exists a constant H such that 
| er | Hr (r!) pa for 2 <r. Note that if the range of x; is finite 
so that | x;| < M, then we have 


ler| < Í Mr-? * f (x) dx = Mr- p, 


and 


— 
_ , 


Zee] (3y^* = 


r! po \m r! 
so that we may take H = M/3. 
8? x? 
Now E(e?) = E (1 + Ox + a; T " 
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: . P 74 E 
E (A) <1+ 37 + 2 241) 


if there exists c such that H |0| — c< 1. 
Hence 
E (e7) rl e9?'vil20—) and E (e? Ezi) a e?! S, (I-) 
(with Sz defined as in Section 5.4), so that 
Pr {e92% > eres « e, 
i.e. i 
A? 6S 
— — Co 
rr C 20 5} e . 
To minimize the bound we take 6? = 2 (1 — c) Ne to give 
Pr (55x42 S, / v(20 — 9) «e 
and for the result to be valid we need 2(1— c) A®/s? H. 
If, for example, we choose c = 1, then we have 
Pr L r > As,)«e for O <A < s,[2H. 


This theorem has been modified by Bernstein (1937), using the 
idea of expectation of x, relative to x,,..., Xy, to remove the 
restriction that the x; be independent. Modifications due to Craig 
(1933) are to take moments over an arbitrarily large, but finite, 
interval — b< x< b or to work with cumulants instead of moments. 


5.6 An inequality for partial sums 

Instead of working with the sum x, + ... + xn, as in the earlier 
sections of this chapter, we may consider all the partial sums 
X3, X4 -+ Xs, ... and seek a bound for the probability that these lie in 
certain intervals. Such a bound was first given by Kolmogoroff 
(1928) (see Kolmogoroff (1929) for corrections to parts of this paper), 
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but we give here the generalized result due to Hajek and Rényi 
(1955), which is itself a special case of a still more general theorem of 
Birnbaum and Marshall (1961). 


We suppose that x,,x,,... is a sequence of mutually inde- 
pendent random variables with zero means and finite variances 
of == E (x). 0,45... is a non-increasing sequence of positive 


numbers. 
If 
2 == I bee + xx)? (ek — cn) + ch (x, +... + xm)? 
then | 


m — 1 
E (s) = 35 (d — n) (od + . of) + A (ot +... + 03) 


n m 
=A do Yda. 
1 


n ＋ 1 
For a value r such that n < < m let E, be the event 
loa + -+H xj el, (nescr), |x .. . + xr| eſcr. 


We now consider expectations of various quantities on the hypothesis 
that E, has occurred. 


We have, for r< i< m, E (x, IE,) = O, so that 
E (( + ... + xx)? |Er) = E (ay + ... + xr)? + 
+ 2 (x, Tee Xr) (& + ... + Xx) -+- (Xr+ +. + xk)? | Er) 
2 E ((% + ... + 2r} EH) > ec (r<k< m). 
Hence 
m— 1 
E(z|Er) Te ( — dH d e = 2 
k =r 
and 
E (z) > 2 E (z E,) P(E)2c B P (E;). 
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> P (Er) = Pr { max cr|x + ... + xx| > €) 
n n K m 


<= (ca >> c? -+ m c ot) / ea. (5.5.1) 
1 n+1 


Kolmogorov proved the special case in which m — 1 and all the 
cx are 1. If we take ck = I/ and oy = o then we have 


— z c? /1 2. 1 -3 202 
Pr t max „ xı . . xk|2 ek) < 3 E * 24 a) as 


Hence |x +... + x,|/m converges to zero (= E(xx)) with 
probability one; this is a form of the strong law of large numbers 
(see Loéve (1955)). 


Marshall (1960) showed that for the one-sided inequality we have 
Pr { max (x, +... + x)) e] < 
Iist=n 
(of + ... + %% + oF +... + ). 
This was proved by putting 
2 
3 = (<> x + oF +... + a) 7 + of + ... + o3, 
1 
taking E, as x, + ...+ x, < e(1« s <r), xı ... + x» z» and 
proceeding as before. 


Marshall also discussed the question of introducing multipliers 
in the way Hajek and Rényi did, but showed that even for n — 2 
the result was complicated. 
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CHAPTER VI 


APPLICATIONS 


In this chapter we consider what use may be made of the material 
in the previous chapters, not only by the statistician but by the pure 
mathematician. For the latter the chief interest of any survey of 
work must be as a source of ideas for further work — in filling in 
gaps in the existing theory, in extending existing ideas to cover new 
or more general situations, or in embedding the whole theory in some 
wider set of ideas. The scope for further work increases as we go 
through the monograph; in Chapter II (apart from the last section) 
there is effectively a complete solution to the problem of finding 
bounds for probability, and all that is needed is to devise ways of 
reducing the amount of computation involved (though this is, of 
course, a far from trivial matter). All this is, however, done under the 
assumption that bounds exist; obviously, if moments arise from an 
actual distribution there is at least one value for the probability in a 
given set, but if we start from an arbitrarily chosen set of numbers it 
is less easy to decide whether they are realizable with a distribution; 
this applies with more force in Chapter III where fewer of the 
restrictive conditions on data such as (2.2.2) are known. Mallows 
(1956) suggests (in his Theorem III, which is really a conjecture) 
that the conditions are realizable if his method leads to bounds, 
and the same seems likely to be true for the methods in this mono- 
graph (Exercise 1 gives slight support to this view.) The existence 
of generality does not obtain in Sections 2.10 and 3.7, where we deal 
only with very specialized problems, and there is considerable scope 
for further work. 


In Chapter IV the situation changes radically; here, when a 
general method applies it leads to computational problems (as in 
Section 4.2) of a different order of difficulty, and this using only 
second-order moments and a simple type of region. Only by 
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specializing the problems still further in some of the later sections 
can we obtain explicit solutions. Each of the sections of this chapter 
is to be regarded only as a sample of the type of complexity likely to 
be met with when more general methods are developed. ‘The work of 
Whittle (1958a) is particularly interesting in taking what amounts 
to the limiting case of a multivariate distribution as its dimension 
becomes infinite and introducing quite different techniques and 
conditions. 


In Chapter V again we have little generality but only a collection 
of special results, although by the exclusion of results expressed in 
phrases such as . all sufficiently large...” 
giving definite numerical values we have ignored a large body of 
intricate work. 


e: 


in favour of results 


If the pure mathematician finds the subject full of gaps the 
statistician may find it all too full of material; his requirement is for 
a formula which will, as speedily as possible, produce a useful result 
from his data. The aim of the section headings has been to indicate 
where results of a particular type may be found (the exercises 
associated with a section should be consulted at the same time), 
but for quick reference an excellent résumé of the simpler results has 
been provided by Savage (1961). Savage also gives three examples of 
the application of inequalities; the first relates to the heights of 
soldiers, which are known to lie between two bounds, and the 
probability that the mean of a sample differ by more than an inch 
from the population mean is required. This problem can be solved 
by using the fact that the variable is bounded to give an estimate of 
its variance and then using Tchebychef’s inequality or, more 
accurately, Bernstein’s inequality. The second example relates to 
the magnitude of cumulative sums; variance is supposed to be 
known from past experience, and Kolmogorov’s inequality is used. 
The third example relates to correlated variables; the coefficient of 
correlation is supposed to be known, and Berge’s inequality is used. 
An application of a different type was suggested by Barton (in the 
discussion following the paper by Mallows (1956)) who noted that 
the distributions of test functions under hypotheses alternative to the 
null hypothesis might not have many numerical parameters known: 
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but might reasonably be supposed to be ''smooth" in the sense that 
the p.d.f. and its derivatives are not very different from what we have 
with the (known) distribution under the null hypothesis. Using 
these ideas, he showed that quite close estimates can be obtained of 
the number of trials necessary to give conclusions with a given 
degree of confidence. 


It should be noted that in all but the first of the four examples 
we assume some knowledge of the distribution — cither actual 
values of parameters or bounds on the p.d.f. or its derivatives. In 
the first we avoid this by obtaining for a bounded distribution crude 
estimates of the parameters. Some such assumption of knowledge 
is inevitable, since all inequalities are expressed partly or wholly in 
terms of population parameters and not the sample estimates of 
them. Consequently we have the paradoxical situation that we can 
use the inequalities most effectively when we know so much about 
the population that usc of the inequalities is unnecessary. At other 
times we work under the tacit assumption that if our estimates are 
not too far from the population values then our conclusions will not 
be too far out; effectively, when we state a result such as „the 
probability is at most..." we are suppressing the supplementary 
statement and the first statement is true with probability ...“. 


A further use of the theory is to study the extent to which 
knowledge of a distribution is relevant. Thus in Section 2.6 it was 
found that the effect of moments above the fourth on the bounds 
L and U was small compared with the effect of the first four moments, 
and hence there would be little point in trying to estimate the higher 
moments. 
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EXERCISES 


The chapter or section to which each exercise is most closely 
related is shown in brackets after the number of the exercise. The 
reader is recommended to construct and solve for himself examples 
such as those worked in Sections 2.4 and 3.5 and to verify the 
numerical results on pages 25, 26, 28 and 29. 


1. (2.3) If uy — 1 and T is 0 « x show that y has no infimum 
if | į | — 1. 


2. (2.6) If F,(x) and F,(x)are the distribution functions of 
distributions with the same first 27 moments prove that 


1 e Pn (a) ^ 4 (k) — Ba (k) 
4 


Fi (k) — F. (k) | < . eas 
Hn (a) . Hon (a) Un (R) -> Hon (k) 


for any value of a. (Khamis (1954).) 


3. (2.7) If T is x 2 k — 0 and », is given, prove that U = 
¥,/k. (This is Markov's inequality.) 


4. (2.9) h(x) is a positive function of x with minimum value 
H and is increasing for x > k 0. 


m= [7 UG) +F(—a)} h(a) dx, 


and T'is | x | >k. Prove that m, > H and sketch the region in which 
(ao, ai) lies if ag + a, h (x) > xr (x). Hence prove that U = min 
(1, (m, — H)/(h (k) — H), whence U < m,/h (k). (The last result 
was given by Cantelli (1910).) 
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5. (2.9) If T is O<x<k and vn, vo, are given, prove that 
for O < kn « vn, L = 0, U = (va, — vi)/(van — và + ( — vn)?); 
for vn « k^ < valvas, L = I — (unk n), U =1; 
for von « kn vn, L = (k" — vn)? |( vən — v2 + (k^ — vay), U—1. 
(Cantelli (1928).) 

6. (II) n mutually independent trials have s mutually exclusive 
results; p; is the probability of the ith result, and g; the observed 


relative frequency. Show, using Markov's inequality (Exercise 3), 
that 


Pr {>> (pi — qi? < 2323 > 1 — (s 1% 4. 
If n’ and n” trials give q;, 9, for the relative frequencies, show that 
Pr ( ( — qj)? — 33 3 —1— (s — D(n'4- n'^)[sn'n'32. 
(Romanovski (1940).) 
7. (II) Ifxisa random variable in (0, 27) and E (sin x) = a, 
E (cos x) = B, then 


æ sin (0 + 4) + B cos (8 + 4) — cos ($ — 6) 
m 1 — cos ( —0) | | 


æ sin (0 + 4) + B cos (0 + $) +1 
1 + cos ( — 6) 


where 0< 0< $< =z. (Marshall and Olkin (1961).) 


Pr (20 < x < 24) > 


Pr (20 <x < 24) > 


8. (II) If & is the mean deviation about the mean p and T is 
1 1 
H. — is e = u + t 8, then P(T)>1 — 3 65 + 5). 
1 2 
Show that if 3 is less than the standard deviation (by Schwarz’s 
inequality it cannot be greater), then for values of ti and ż, near unity 


the inequality using mean deviation is sharper than the one given 
in Section 2.5. (Glasser (1961).) 
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9. (3.2) Show that if f (x) has a single maximum at x = 0 and | "9 


va is given, then 

L = XxX / (n +1) va) if k & n ((n + 1) „/ /n + 1); 

L = 1 — v4 n"[|k"^ (n+ 1)” if n ((n + 1) %) /n + 1) K. 
(The history of these inequalities is given in Fréchet (1950).) 


10. (3.2) x is a non-negative variable, f(x) has a single 
maximum at c, v, is given, and Tis O x « k. By consideration of 


IN x (x — c) f’ (x) dx prove that c < 2 v,- 
0 


Let a = (2k — c — 2 y (k (k — o)ye?. 
If v, < c < k show that L = 1 + (c 21) a. 
If c < v, c < kand b = (k — c)/4 (c — i): < a, show that 


L = 2b (c — »,) N (k — c). 


If x <c «2», — k show that L = O. 
If k < c, 2v, — k < c show that L = (k + c — 27 /c. 


11. (3.2) Ai = O, py = M, f(x) has a single maximum at 
x = k, and T is |x|<k. Prove that L = 64/81, U =1. 


12. (III) xis a non-negative variable, 


mn — ja x* f (x) dx, 
0 
and z is defined as a function of y by 
y= | f (x) dx = F (zm). 
0 
Prove that z is a non-decreasing function of y and that 
1 
1 = Í 2” dy. 
0 
If f (x) is non-decreasing for 0 < x < bm (1 < b < (n + 1)!/») prove 


that the graph of z against y is concave downwards for 0 — z —b 
and hence that 
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O « F(km)<kjb for O k< b, 
k/b — (b — k) (n + 1 — 5% (bn — k”) 
F (m) k|b for b, « R « b, 
( — 1) (n + 1)/((n + 1)&" — b”) 
F (m) I ford<k, 
where b, is the positive root of & (b? — kn) = (b, — k)(n + 1 — 57). 
(Narumi (1923); note that the restriction on f (x) is over a range 


from the origin and not on the tail of the distribution for large x, as 
in the work of von Mises (1938).) 


13. (III) Replace the condition in Exercise 12 by f (x) non- 
increasing and show that 
z « nbn y/(n + 1) (^ — 1) for 0c == 1 — b-n 
and hence that 
F (km) > (n + 1) (b^ — 1) k/nb™ for O < k< nb[(n + 1), 
21—6b for nb/(n 4- 1) c Rb 
>1—k* for b< k. 
(Narumi (1923).) 


14. (III) x is a non-negative variable and f(x) is non- 
increasing for v, < x. Show that 


oo 1 = AS F — F,y 
i lilii a — 00 e Fi)? * mi 
where the tangent to the graph of F (x) at x = k meets x = v, at 
(vi F4). Deduce that 
F (tc) > 1 — 4 (e? — % (k — r). 
(Peek 1933).) 


15. (4.2) A matrix equivalent to 


—1 t t 
( t — 1 0 
t t —1 
was proposed for A by Lal (1955); show that this gives a best possible 


result only if all correlation coefficients are equal to 


(2 — 22)/(1 + 212). 
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16. (4.8) 2x,...xg$, are independent with means zero and 
variances o?, Prove that 


2n 
Pr { $ x? >Ano*} «1 forA<2 
1 


<1f(A—1) for2<A<4(3 + v5) 


2 


<5 (1—3)) br + N 


(Birnbaum, Raymond and Zuckerman (1947).) 


17. (4.8) If non-negative variables x, and x, have the same 
distribution with mean p prove that 


Pr (Ki + xg) > tu } < (2/t? for2«t«5/2 
< 2/t — 1/t* for 5/2 « t. 
Show that with its stronger hypothesis this gives an improvement 


over Birnbaum, Raymond and Zuckerman's inequality (Exercise 
16) for 5/2 < t < à (3 + 4/5). 


18. (5.3) If x, and x, have probability 4 at x = a and x = —a, 
show that the proposition proved in (5.3) is false. (Birnbaum (1948).) 
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